You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@oozie.apache.org by Tim Chan <ti...@chan.net> on 2012/03/07 02:56:58 UTC

using DataSet to define a date range of input directories

I would like to be able to specify a date range and have oozie feed that as
a list of input directories for my workflow.

For example, my input data is stored in this fashion:

mydata/${YEAR}/${MONTH}/${DAY}

I would like to specify date ranges that aren't whole months, for example:

Jan 13 - Feb 3.

Re: using DataSet to define a date range of input directories

Posted by Harshal <ha...@komli.com>.
Hi Tim,

We use the same technique. Also we use it to pass any runtime variables 

Regards,

On Wed, 2012-03-07 at 00:30 -0800, Tim Chan wrote:
> Hi Mohammad,
> 
> I understand what you've described, I will make an attempt tomorrow. Thank
> you.
> 
> 
> 
> On Wed, Mar 7, 2012 at 12:02 AM, Mohammad Islam <mi...@yahoo.com> wrote:
> 
> > Hi Tim,
> > Currently it is not directly supported.
> > I would ask you to write a Java action which will be the first action of
> > your DAG.
> > Pass the date range into your java action. In your java code, do whatever
> > processing you want to do and write the properties (will be consumed by
> > subsequent actions) into  a pre-defined file. At last refer the variable
> > into your next action as input directory.
> >
> > One such example can be found at:
> > https://github.com/yahoo/oozie/wiki/Oozie-WF-use-cases
> > search for  "Java-Main Action".
> >
> > Please let us know if you need more help.
> > Regards,
> > Mohammad
> >
> >
> >
> > ----- Original Message -----
> > From: Tim Chan <ti...@chan.net>
> > To: oozie-users@incubator.apache.org; Mohammad Islam <mi...@yahoo.com>
> > Cc:
> > Sent: Tuesday, March 6, 2012 8:15 PM
> > Subject: Re: using DataSet to define a date range of input directories
> >
> > Hi Mohammad,
> >
> > For this scenario, let's say it is a fixed date range, meaning that I will
> > specify manually the start and end dates.
> > I do not need to have the job wait. We can assume that the input files will
> > be present.
> >
> >
> > On Tue, Mar 6, 2012 at 8:06 PM, Mohammad Islam <mi...@yahoo.com> wrote:
> >
> > > Hi Tim,
> > > Is it fixed date range or relative?
> > > If relative how do you define it?
> > >
> > > Does the range have fixed length or variable length.
> > >
> > > Do you want the job to wait for the data available on those days and then
> > > launch workflow with those directories?
> > >
> > >
> > > Regards,
> > > Mohammad
> > >
> > >
> > >
> > >
> > > ----- Original Message -----
> > > From: Tim Chan <ti...@chan.net>
> > > To: oozie-users@incubator.apache.org
> > > Cc:
> > > Sent: Tuesday, March 6, 2012 5:56 PM
> > > Subject: using DataSet to define a date range of input directories
> > >
> > > I would like to be able to specify a date range and have oozie feed that
> > as
> > > a list of input directories for my workflow.
> > >
> > > For example, my input data is stored in this fashion:
> > >
> > > mydata/${YEAR}/${MONTH}/${DAY}
> > >
> > > I would like to specify date ranges that aren't whole months, for
> > example:
> > >
> > > Jan 13 - Feb 3.
> > >
> > >
> >
> >
> > --
> > [image: Monkey]  Tim Chan   //  tim@chan.net   //   213.784.2523
> >
> >
> 
> 



Re: using DataSet to define a date range of input directories

Posted by Tim Chan <ti...@chan.net>.
Hi Mohammad,

I understand what you've described, I will make an attempt tomorrow. Thank
you.



On Wed, Mar 7, 2012 at 12:02 AM, Mohammad Islam <mi...@yahoo.com> wrote:

> Hi Tim,
> Currently it is not directly supported.
> I would ask you to write a Java action which will be the first action of
> your DAG.
> Pass the date range into your java action. In your java code, do whatever
> processing you want to do and write the properties (will be consumed by
> subsequent actions) into  a pre-defined file. At last refer the variable
> into your next action as input directory.
>
> One such example can be found at:
> https://github.com/yahoo/oozie/wiki/Oozie-WF-use-cases
> search for  "Java-Main Action".
>
> Please let us know if you need more help.
> Regards,
> Mohammad
>
>
>
> ----- Original Message -----
> From: Tim Chan <ti...@chan.net>
> To: oozie-users@incubator.apache.org; Mohammad Islam <mi...@yahoo.com>
> Cc:
> Sent: Tuesday, March 6, 2012 8:15 PM
> Subject: Re: using DataSet to define a date range of input directories
>
> Hi Mohammad,
>
> For this scenario, let's say it is a fixed date range, meaning that I will
> specify manually the start and end dates.
> I do not need to have the job wait. We can assume that the input files will
> be present.
>
>
> On Tue, Mar 6, 2012 at 8:06 PM, Mohammad Islam <mi...@yahoo.com> wrote:
>
> > Hi Tim,
> > Is it fixed date range or relative?
> > If relative how do you define it?
> >
> > Does the range have fixed length or variable length.
> >
> > Do you want the job to wait for the data available on those days and then
> > launch workflow with those directories?
> >
> >
> > Regards,
> > Mohammad
> >
> >
> >
> >
> > ----- Original Message -----
> > From: Tim Chan <ti...@chan.net>
> > To: oozie-users@incubator.apache.org
> > Cc:
> > Sent: Tuesday, March 6, 2012 5:56 PM
> > Subject: using DataSet to define a date range of input directories
> >
> > I would like to be able to specify a date range and have oozie feed that
> as
> > a list of input directories for my workflow.
> >
> > For example, my input data is stored in this fashion:
> >
> > mydata/${YEAR}/${MONTH}/${DAY}
> >
> > I would like to specify date ranges that aren't whole months, for
> example:
> >
> > Jan 13 - Feb 3.
> >
> >
>
>
> --
> [image: Monkey]  Tim Chan   //  tim@chan.net   //   213.784.2523
>
>


-- 
[image: Monkey]  Tim Chan   //  tim@chan.net   //   213.784.2523

Re: using DataSet to define a date range of input directories

Posted by Mohammad Islam <mi...@yahoo.com>.
Hi Tim,
Currently it is not directly supported.
I would ask you to write a Java action which will be the first action of your DAG.
Pass the date range into your java action. In your java code, do whatever processing you want to do and write the properties (will be consumed by subsequent actions) into  a pre-defined file. At last refer the variable into your next action as input directory.

One such example can be found at:
https://github.com/yahoo/oozie/wiki/Oozie-WF-use-cases
search for  "Java-Main Action".

Please let us know if you need more help.
Regards,
Mohammad
 


----- Original Message -----
From: Tim Chan <ti...@chan.net>
To: oozie-users@incubator.apache.org; Mohammad Islam <mi...@yahoo.com>
Cc: 
Sent: Tuesday, March 6, 2012 8:15 PM
Subject: Re: using DataSet to define a date range of input directories

Hi Mohammad,

For this scenario, let's say it is a fixed date range, meaning that I will
specify manually the start and end dates.
I do not need to have the job wait. We can assume that the input files will
be present.


On Tue, Mar 6, 2012 at 8:06 PM, Mohammad Islam <mi...@yahoo.com> wrote:

> Hi Tim,
> Is it fixed date range or relative?
> If relative how do you define it?
>
> Does the range have fixed length or variable length.
>
> Do you want the job to wait for the data available on those days and then
> launch workflow with those directories?
>
>
> Regards,
> Mohammad
>
>
>
>
> ----- Original Message -----
> From: Tim Chan <ti...@chan.net>
> To: oozie-users@incubator.apache.org
> Cc:
> Sent: Tuesday, March 6, 2012 5:56 PM
> Subject: using DataSet to define a date range of input directories
>
> I would like to be able to specify a date range and have oozie feed that as
> a list of input directories for my workflow.
>
> For example, my input data is stored in this fashion:
>
> mydata/${YEAR}/${MONTH}/${DAY}
>
> I would like to specify date ranges that aren't whole months, for example:
>
> Jan 13 - Feb 3.
>
>


-- 
[image: Monkey]  Tim Chan   //  tim@chan.net   //   213.784.2523


Re: using DataSet to define a date range of input directories

Posted by Tim Chan <ti...@chan.net>.
Hi Mohammad,

For this scenario, let's say it is a fixed date range, meaning that I will
specify manually the start and end dates.
I do not need to have the job wait. We can assume that the input files will
be present.


On Tue, Mar 6, 2012 at 8:06 PM, Mohammad Islam <mi...@yahoo.com> wrote:

> Hi Tim,
> Is it fixed date range or relative?
> If relative how do you define it?
>
> Does the range have fixed length or variable length.
>
> Do you want the job to wait for the data available on those days and then
> launch workflow with those directories?
>
>
> Regards,
> Mohammad
>
>
>
>
> ----- Original Message -----
> From: Tim Chan <ti...@chan.net>
> To: oozie-users@incubator.apache.org
> Cc:
> Sent: Tuesday, March 6, 2012 5:56 PM
> Subject: using DataSet to define a date range of input directories
>
> I would like to be able to specify a date range and have oozie feed that as
> a list of input directories for my workflow.
>
> For example, my input data is stored in this fashion:
>
> mydata/${YEAR}/${MONTH}/${DAY}
>
> I would like to specify date ranges that aren't whole months, for example:
>
> Jan 13 - Feb 3.
>
>


-- 
[image: Monkey]  Tim Chan   //  tim@chan.net   //   213.784.2523

Re: using DataSet to define a date range of input directories

Posted by Mohammad Islam <mi...@yahoo.com>.
Hi Tim,
Is it fixed date range or relative?
If relative how do you define it?

Does the range have fixed length or variable length.

Do you want the job to wait for the data available on those days and then launch workflow with those directories?


Regards,
Mohammad




----- Original Message -----
From: Tim Chan <ti...@chan.net>
To: oozie-users@incubator.apache.org
Cc: 
Sent: Tuesday, March 6, 2012 5:56 PM
Subject: using DataSet to define a date range of input directories

I would like to be able to specify a date range and have oozie feed that as
a list of input directories for my workflow.

For example, my input data is stored in this fashion:

mydata/${YEAR}/${MONTH}/${DAY}

I would like to specify date ranges that aren't whole months, for example:

Jan 13 - Feb 3.