You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Mona Chitnis (JIRA)" <ji...@apache.org> on 2013/08/23 02:09:52 UTC

[jira] [Commented] (OOZIE-1504) parameterize coord EL functions (latest and current)

    [ https://issues.apache.org/jira/browse/OOZIE-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13748083#comment-13748083 ] 

Mona Chitnis commented on OOZIE-1504:
-------------------------------------

Some additional scope for this enhancement:

1) A coordinator with an hourly frequency and every hr takes other hourly feed
as input plus all the previous instances unto current hr of each day.
Input feed is  xyz

First instance of coordinator – 2013080101  -->  it take feed xyz's  0th hr and
also take feed xyz's  1st hr  i.e..( current plus all previous)
<start-instance>${coord:current(-1)}</start-instance>
<end-instance>${coord:current(0)}</end-instance>

Second instance of coordinator – 2013080102  -->  it take feed xyz's  0th hr
and take feed xyz's  1st hr and   take feed xyz's  2nd hr ( i.e.. current plus
all previous)
<start-instance>${coord:current(-2)}</start-instance>
<end-instance>${coord:current(0)}</end-instance>

..and so on..

2) A workflow that processes input datasets for the past 30 days:

<input-events>
        <data-in name="event_input_path_format1" dataset="EVENT_INPUT_FORMAT1">
            <start-instance>${coord:current(-30)}</start-instance>
            <end-instance>${coord:current(-1)}</end-instance>
        </data-in>
        <data-in name="event_input_path_format2" dataset="EVENT_INPUT_FORMAT2">
            <start-instance>${coord:current(-30)}</start-instance>
            <end-instance>${coord:current(-1)}</end-instance>
        </data-in>
    </input-events>

Instead, one wants it to start on the same specific date so that each day, it
processes one more data set than the last day (the datasets are daily). Something like the following is not supported

    <input-events>
        <data-in name="event_input_path_format1" dataset="EVENT_INPUT_FORMAT1">
            <start-instance>2013-03-15T00:00Z</start-instance>
            <end-instance>${coord:current(-1)}</end-instance>
        </data-in>
        <data-in name="event_input_path_format2" dataset="EVENT_INPUT_FORMAT2">
            <start-instance>2013-03-15T00:00Z</start-instance>            
            <end-instance>${coord:current(-1)}</end-instance>
        </data-in>
    </input-events>

Use case #2 would require <instance> supporting parameterization as well.
                
> parameterize coord EL functions (latest and current)
> ----------------------------------------------------
>
>                 Key: OOZIE-1504
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1504
>             Project: Oozie
>          Issue Type: Improvement
>    Affects Versions: trunk
>            Reporter: Ryota Egashira
>             Fix For: trunk
>
>
> for example, coordinator.xml
> -----
>     <input-events>
>          <data-in name="foo" dataset="bar">
>             <start-instance>${coord:latest(-365)}</start-instance>
>             <end-instance>${coord:latest(0)}</end-instance>
>          </data-in>
>     </input-events>
> -----
> there are use cases to use the same coordinator.xml for varying number of data instances (not always 365). but the parameter to coord EL function cannot be parametrized (by using job
> config), customer needs to copy coordinator.xml with different number just to change the parameter of coord:latest(), which is not optimal for maintenance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira