You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by Jaydeep Vishwakarma <ja...@inmobi.com> on 2015/04/23 12:42:40 UTC

Aperiodic data handling in Oozie

Hi All,

Currently Oozie scheduling works on periodic datasets. It does not have any
mechanism to handle aperiodic datasets, which doesn’t follow a fixed
schedule/frequency.


Use cases


   1.

   When incoming dataset arrives with no fixed schedule.
   2. Need to trigger the job based all data available since last run with
   a possible cap on the max size to process in one run.
   3. Try to avoid creating so many instances when you know input instances
   will be very few.


Regards,
Jaydeep

-- 
_____________________________________________________________
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.

Re: Aperiodic data handling in Oozie

Posted by Jaydeep Vishwakarma <ja...@inmobi.com>.
Hi All,

I have put my thoughts on the document. I think this feature will give
oozie a new dimension. Please have a look and provide your valuable feeback.
Please follow this jira to get more detail.
https://issues.apache.org/jira/browse/OOZIE-2216

Regards,
Jaydeep

On Thu, Apr 23, 2015 at 4:12 PM, Jaydeep Vishwakarma <
jaydeep.vishwakarma@inmobi.com> wrote:

> Hi All,
>
> Currently Oozie scheduling works on periodic datasets. It does not have
> any mechanism to handle aperiodic datasets, which doesn’t follow a fixed
> schedule/frequency.
>
>
> Use cases
>
>
>    1.
>
>    When incoming dataset arrives with no fixed schedule.
>    2. Need to trigger the job based all data available since last run
>    with a possible cap on the max size to process in one run.
>    3. Try to avoid creating so many instances when you know input
>    instances will be very few.
>
>
> Regards,
> Jaydeep
>
>
>

-- 
_____________________________________________________________
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.