You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "purshotam shah (JIRA)" <ji...@apache.org> on 2013/09/24 00:09:06 UTC

[jira] [Updated] (OOZIE-1554) Support variables for coord data-in/data-out dataset

     [ https://issues.apache.org/jira/browse/OOZIE-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

purshotam shah updated OOZIE-1554:
----------------------------------

    Description: 
One would like to have a centralized list of datasets,
and use the <include> tag to make them available to every coordinator.  One 
would like to re-use our coordinator code, as most of his processing follows
the same steps, but with differing input and output feeds.  

He need to be able to set the data-in and data-out dataset
values to variables; 


My bundle coordinator entry looks like this:
   <coordinator name="data1-2">
        <app-path>/user/harveyc/oozie_test/src/test_coordA.xml</app-path>
        <configuration>
            <property><name>wf_name</name><value>1-2</value></property>
            <property><name>dataset_A</name><value>dataA</value></property>
            <property><name>dataset_B</name><value>dataB</value></property>
        </configuration>
    </coordinator>


Coord looks 
<coordinator-app name="COORD_A_TEST" frequency="${coord:minutes(1)}"
start="${startTime}" end="${endTime}" timezone="${timezoneCode}" 
xmlns:sla="uri:oozie:sla:0.1" xmlns="uri:oozie:coordinator:0.2">
  <datasets>
   
<include>${nameNode}/user/harveyc/oozie_test/datasets/test_datasets.xml</include>
  </datasets>
  <input-events>
      <data-in name="inputDataA" dataset="${dataset_A}">
        <instance>${coord:current(0)}</instance>
      </data-in>
  </input-events>
  <output-events>
      <data-out name="outputDataB" dataset="${dataset_B}">
        <instance>${coord:current(0)}</instance>
      </data-out>
  </output-events>
  <action>
   <workflow>
      <app-path>/user/harveyc/oozie_test/src/wf_touchz.xml</app-path>
       <configuration>
         <property><name>name</name><value>${wf_name}</value></property>
        
<property><name>touchzpathb</name><value>${coord:dataOut('outputDataB')}</value></property>
       </configuration>
    </workflow>
  </action>     
</coordinator-app>

Test_datasets.xml looks like this:
  <datasets>
      <dataset name="dataA" frequency="${coord:minutes(1)}"
initial-instance="${ds_startTime}" timezone="${timezoneCode}">
       
<uri-template>${nameNode}/user/harveyc/oozie_test/data1/${YEAR}${MONTH}${DAY}${HOUR}${MINUTE}</uri-template>
      </dataset>
      <dataset name="dataB" frequency="${coord:minutes(1)}"
initial-instance="${ds_startTime}" timezone="${timezoneCode}">
       
<uri-template>${nameNode}/user/harveyc/oozie_test/data2/${YEAR}${MONTH}${DAY}${HOUR}${MINUTE}</uri-template>
      </dataset>
  </datasets>

The error is:
Error: Invalid workflow-app, org.xml.sax.SAXParseException; lineNumber: 8;
columnNumber: 57; cvc-pattern-valid: Value '${dataset_A}' is not facet-valid
with respect to pattern '([a-zA-Z]([\-_a-zA-Z0-9])*){1,39}' for type
'IDENTIFIER'.

    
> Support variables for coord data-in/data-out dataset	
> -----------------------------------------------------
>
>                 Key: OOZIE-1554
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1554
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: purshotam shah
>
> One would like to have a centralized list of datasets,
> and use the <include> tag to make them available to every coordinator.  One 
> would like to re-use our coordinator code, as most of his processing follows
> the same steps, but with differing input and output feeds.  
> He need to be able to set the data-in and data-out dataset
> values to variables; 
> My bundle coordinator entry looks like this:
>    <coordinator name="data1-2">
>         <app-path>/user/harveyc/oozie_test/src/test_coordA.xml</app-path>
>         <configuration>
>             <property><name>wf_name</name><value>1-2</value></property>
>             <property><name>dataset_A</name><value>dataA</value></property>
>             <property><name>dataset_B</name><value>dataB</value></property>
>         </configuration>
>     </coordinator>
> Coord looks 
> <coordinator-app name="COORD_A_TEST" frequency="${coord:minutes(1)}"
> start="${startTime}" end="${endTime}" timezone="${timezoneCode}" 
> xmlns:sla="uri:oozie:sla:0.1" xmlns="uri:oozie:coordinator:0.2">
>   <datasets>
>    
> <include>${nameNode}/user/harveyc/oozie_test/datasets/test_datasets.xml</include>
>   </datasets>
>   <input-events>
>       <data-in name="inputDataA" dataset="${dataset_A}">
>         <instance>${coord:current(0)}</instance>
>       </data-in>
>   </input-events>
>   <output-events>
>       <data-out name="outputDataB" dataset="${dataset_B}">
>         <instance>${coord:current(0)}</instance>
>       </data-out>
>   </output-events>
>   <action>
>    <workflow>
>       <app-path>/user/harveyc/oozie_test/src/wf_touchz.xml</app-path>
>        <configuration>
>          <property><name>name</name><value>${wf_name}</value></property>
>         
> <property><name>touchzpathb</name><value>${coord:dataOut('outputDataB')}</value></property>
>        </configuration>
>     </workflow>
>   </action>     
> </coordinator-app>
> Test_datasets.xml looks like this:
>   <datasets>
>       <dataset name="dataA" frequency="${coord:minutes(1)}"
> initial-instance="${ds_startTime}" timezone="${timezoneCode}">
>        
> <uri-template>${nameNode}/user/harveyc/oozie_test/data1/${YEAR}${MONTH}${DAY}${HOUR}${MINUTE}</uri-template>
>       </dataset>
>       <dataset name="dataB" frequency="${coord:minutes(1)}"
> initial-instance="${ds_startTime}" timezone="${timezoneCode}">
>        
> <uri-template>${nameNode}/user/harveyc/oozie_test/data2/${YEAR}${MONTH}${DAY}${HOUR}${MINUTE}</uri-template>
>       </dataset>
>   </datasets>
> The error is:
> Error: Invalid workflow-app, org.xml.sax.SAXParseException; lineNumber: 8;
> columnNumber: 57; cvc-pattern-valid: Value '${dataset_A}' is not facet-valid
> with respect to pattern '([a-zA-Z]([\-_a-zA-Z0-9])*){1,39}' for type
> 'IDENTIFIER'.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira