You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@oozie.apache.org by Peter Cseh <ge...@cloudera.com> on 2017/05/30 19:26:55 UTC

Re: Oozie Job + Version Control Tool

I'd like to pour this conversation into
https://issues.apache.org/jira/browse/OOZIE-2876 and it's subtasks. It
looks like those action types are trying to solve these issues.
Please share your thoughts on this approach there.
Thanks!

gp


On Mon, Oct 31, 2016 at 8:12 AM, goun na <go...@gmail.com> wrote:

> maven-assembly-plugin is good to use separate packaging. Thanks. :)
>
> 2016-10-20 15:05 GMT+09:00 Per Ullberg <pe...@klarna.com>:
>
> > Here's a pom, but I don't think it will tell you much.
> >
> >
> > <?xml version="1.0" encoding="UTF-8"?>
> > <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="
> > http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="
> > http://maven.apache.org/POM/4.0.0
> > http://maven.apache.org/xsd/maven-4.0.0.xsd">
> >     <parent>
> >         <artifactId>hbase-to-hdfs-parent</artifactId>
> >         <groupId>com.klarna.datavault</groupId>
> >         <version>4.1.1-SNAPSHOT</version>
> >     </parent>
> >     <modelVersion>4.0.0</modelVersion>
> >
> >     <artifactId>hbase-to-hdfs</artifactId>
> >
> >     <dependencies>
> >
> > ...
> >
> >         <dependency>
> >             <groupId>com.github.spullara.mustache.java</groupId>
> >             <artifactId>compiler</artifactId>
> >             <version>0.8.18</version>
> >         </dependency>
> >
> >         <dependency>
> >             <groupId>joda-time</groupId>
> >             <artifactId>joda-time</artifactId>
> >             <version>2.8</version>
> >             <scope>test</scope>
> >         </dependency>
> >         <dependency>
> >             <groupId>com.klarna</groupId>
> >             <artifactId>hiverunner</artifactId>
> >             <version>2.6.0</version>
> >             <scope>test</scope>
> >             <exclusions>
> >                 <exclusion>
> >                     <artifactId>junit</artifactId>
> >                     <groupId>junit</groupId>
> >                 </exclusion>
> >                 <exclusion>
> >                     <groupId>org.apache.calcite</groupId>
> >                     <artifactId>calcite-avatica</artifactId>
> >                 </exclusion>
> >                 <exclusion>
> >                     <groupId>org.apache.calcite</groupId>
> >                     <artifactId>calcite-core</artifactId>
> >                 </exclusion>
> >             </exclusions>
> >         </dependency>
> >     </dependencies>
> >
> >
> >     <build>
> >         <resources>
> >             <resource>
> >                 <directory>src/main/resources</directory>
> >                 <filtering>true</filtering>
> >             </resource>
> >             <resource>
> >                 <directory>src/main/templates</directory>
> >                 <filtering>true</filtering>
> >             </resource>
> >         </resources>
> >
> >         <plugins>
> >             <plugin>
> >                 <groupId>org.apache.maven.plugins</groupId>
> >                 <artifactId>maven-assembly-plugin</artifactId>
> >                 <version>2.4</version>
> >                 <configuration>
> >                     <descriptor>assembly-descriptor.xml</descriptor>
> >                 </configuration>
> >                 <executions>
> >                     <execution>
> >                         <phase>package</phase>
> >                         <goals>
> >                             <goal>single</goal>
> >                         </goals>
> >                     </execution>
> >                 </executions>
> >             </plugin>
> >         </plugins>
> >     </build>
> >
> > </project>
> >
> >
> > We use mustache a lot to materialize coordinators, workflows and hql,
> both
> > buildtime and runtime. A nice thing we came up with is that some of our
> > workflows have sub-workflows that gets materialized by the main workflow
> at
> > runtime. That way, different sub workflows may be generated dependent on
> a
> > runtime configuration. As an example we have sub workflows that load data
> > from either to postgres or to kafka. These two ways of loading are
> > described in two separate sub-workflow templates and are materialized at
> > runtime dependent on if the user configured a postgres or a kafka load.
> >
> > This way we can build workflows that can be more reactive to the current
> > state of the cluster and configuration.
> >
> > Here's an oozie snippet:
> >
> > <action name="materialize_load_sub_workflow" cred="hive_credentials">
> >     <java>
> >         <prepare>
> >             <mkdir path="${wf:appPath()}/load/"/>
> >         </prepare>
> >         <main-class>com.klarna.datavault.load.materialize.
> > MaterializeLoadSubWorkflowMain</main-class>
> >         <!-- Output location for the materialized workflow XML -->
> >         <arg>-o</arg>
> >         <arg>${wf:appPath()}/load/load_sub_workflow-${wf:id()}.xml</arg>
> >         <!-- Output location for the materialized kafka.properties
> > file for a Kafka Load -->
> >         <arg>-k</arg>
> >         <arg>${wf:appPath()}/load/kafka.properties</arg>
> >         <arg>-w</arg>
> >         <arg>${workflowConfigPath}</arg>
> >         <arg>-d</arg>
> >         <arg>${TARGET_DB_LOCATION}</arg>
> >         <arg>-s</arg>
> >         <arg>${wf:appPath()}/load/hql-delta-transform-${wf:id()}.
> hql</arg>
> >         <arg>-n</arg>
> >         <arg>${TARGET_DB_NAME}</arg>
> >         <arg>-c</arg>
> >         <arg>${LOAD_DIFF_DB_LOCATION}</arg>
> >         <file>${HIVE_SITE_XML}</file>
> >         <capture-output/>
> >     </java>
> >     <ok to="execute_load"/>
> >     <error to="report_failure"/>
> > </action>
> >
> >
> > <action name="execute_load">
> >     <sub-workflow>
> >         <app-path>${wf:appPath()}/load/load_sub_workflow-${wf:
> > id()}.xml</app-path>
> >         <propagate-configuration/>
> >         <configuration>
> >             <property>
> >                 <name>jobName</name>
> >                 <value>${jobName}-load</value>
> >             </property>
> >             <property>
> >                 <name>parentWorkflowAppPath</name>
> >                 <value>${wf:appPath()}</value>
> >             </property>
> >             <property>
> >                 <name>kafkaConfigFilePath</name>
> >                 <value>${wf:appPath()}/load/kafka.properties</value>
> >             </property>
> >             <property>
> >                 <name>hqlDeltaTransformationPath</name>
> >                 <value>load/hql-delta-transform-${wf:id()}.hql</value>
> >             </property>
> >         </configuration>
> >     </sub-workflow>
> >     <ok to="generate_view_script"/>
> >     <error to="report_failing_sub_wf"/>
> > </action>
> >
> >
> >
> > Sorry if I'm blabbing
> > /Pelle
> >
> > On Thu, Oct 20, 2016 at 3:52 AM, goun na <go...@gmail.com> wrote:
> >
> > > Per Ullberg, a snippet of pom.xml would help us. :)
> > > Thanks,
> > >
> > > 2016-10-20 3:36 GMT+09:00 Per Ullberg <pe...@klarna.com>:
> > >
> > > > @goun na: we keep one coordinator per (zip|war|jar)
> > > >
> > > > @shiva: I'm happy to share, but it's hard to know what you're in need
> > of.
> > > > Ask and I will try to answer :)
> > > >
> > > > /Pelle
> > > >
> > > >
> > > > On Wednesday, October 19, 2016, Shiva Ramagopal <tr...@gmail.com>
> > > wrote:
> > > >
> > > > > Per,
> > > > >
> > > > > Your approach seems very interesting. Could you elaborate more on
> > your
> > > > > approach?
> > > > >
> > > > > Thanks,
> > > > > Shiva
> > > > >
> > > > > On Wed, Oct 19, 2016 at 2:19 PM, Per Ullberg <
> per.ullberg@klarna.com
> > > > > <javascript:;>> wrote:
> > > > >
> > > > > > We package our oozie jobs with maven and release artifacts to
> > nexus.
> > > We
> > > > > > keep the version number as part of the coordinator name. That way
> > we
> > > > have
> > > > > > full traceability between code base and running coordinators.
> > > > > >
> > > > > > regards
> > > > > > /Pelle
> > > > > >
> > > > > > On Wed, Oct 19, 2016 at 10:05 AM, Abhishek Bafna <
> > > bafna.iitr@gmail.com
> > > > > <javascript:;>>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > Oozie does not have version control for jobs. When you submit a
> > > > > > > workflow/coordinator/bundle to oozie, it stores it into DB uses
> > it
> > > > from
> > > > > > > there for further execution.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Abhishek
> > > > > > > > On Oct 19, 2016, at 1:16 PM, goun na <gounna@gmail.com
> > > > > <javascript:;>> wrote:
> > > > > > > >
> > > > > > > > Hi users,
> > > > > > > >
> > > > > > > > What is the best to manage Oozie jobs? Is there a built-in
> > > version
> > > > > > > control
> > > > > > > > feature?
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Goun Na
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > *Per Ullberg*
> > > > > > Data Vault Tech Lead
> > > > > > Odin Uppsala
> > > > > > +46 701612693 <+46+701612693>
> > > > > >
> > > > > > Klarna AB (publ)
> > > > > > Sveavägen 46, 111 34 Stockholm
> > > > > > Tel: +46 8 120 120 00 <+46812012000>
> > > > > > Reg no: 556737-0431
> > > > > > klarna.com
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > >
> > > > *Per Ullberg*
> > > > Data Vault Tech Lead
> > > > Odin Uppsala
> > > > +46 701612693 <+46+701612693>
> > > >
> > > > Klarna AB (publ)
> > > > Sveavägen 46, 111 34 Stockholm
> > > > Tel: +46 8 120 120 00 <+46812012000>
> > > > Reg no: 556737-0431
> > > > klarna.com
> > > >
> > >
> >
> >
> >
> > --
> >
> > *Per Ullberg*
> > Data Vault Tech Lead
> > Odin Uppsala
> > +46 701612693 <+46+701612693>
> >
> > Klarna AB (publ)
> > Sveavägen 46, 111 34 Stockholm
> > Tel: +46 8 120 120 00 <+46812012000>
> > Reg no: 556737-0431
> > klarna.com
> >
>



-- 
Peter Cseh
Software Engineer
<http://www.cloudera.com>

Re: Oozie Job + Version Control Tool

Posted by goun na <go...@gmail.com>.
Thanks for creating OOZIE-2876!

2017-05-31 4:26 GMT+09:00 Peter Cseh <ge...@cloudera.com>:

> I'd like to pour this conversation into
> https://issues.apache.org/jira/browse/OOZIE-2876 and it's subtasks. It
> looks like those action types are trying to solve these issues.
> Please share your thoughts on this approach there.
> Thanks!
>
> gp
>
>
> On Mon, Oct 31, 2016 at 8:12 AM, goun na <go...@gmail.com> wrote:
>
> > maven-assembly-plugin is good to use separate packaging. Thanks. :)
> >
> > 2016-10-20 15:05 GMT+09:00 Per Ullberg <pe...@klarna.com>:
> >
> > > Here's a pom, but I don't think it will tell you much.
> > >
> > >
> > > <?xml version="1.0" encoding="UTF-8"?>
> > > <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="
> > > http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="
> > > http://maven.apache.org/POM/4.0.0
> > > http://maven.apache.org/xsd/maven-4.0.0.xsd">
> > >     <parent>
> > >         <artifactId>hbase-to-hdfs-parent</artifactId>
> > >         <groupId>com.klarna.datavault</groupId>
> > >         <version>4.1.1-SNAPSHOT</version>
> > >     </parent>
> > >     <modelVersion>4.0.0</modelVersion>
> > >
> > >     <artifactId>hbase-to-hdfs</artifactId>
> > >
> > >     <dependencies>
> > >
> > > ...
> > >
> > >         <dependency>
> > >             <groupId>com.github.spullara.mustache.java</groupId>
> > >             <artifactId>compiler</artifactId>
> > >             <version>0.8.18</version>
> > >         </dependency>
> > >
> > >         <dependency>
> > >             <groupId>joda-time</groupId>
> > >             <artifactId>joda-time</artifactId>
> > >             <version>2.8</version>
> > >             <scope>test</scope>
> > >         </dependency>
> > >         <dependency>
> > >             <groupId>com.klarna</groupId>
> > >             <artifactId>hiverunner</artifactId>
> > >             <version>2.6.0</version>
> > >             <scope>test</scope>
> > >             <exclusions>
> > >                 <exclusion>
> > >                     <artifactId>junit</artifactId>
> > >                     <groupId>junit</groupId>
> > >                 </exclusion>
> > >                 <exclusion>
> > >                     <groupId>org.apache.calcite</groupId>
> > >                     <artifactId>calcite-avatica</artifactId>
> > >                 </exclusion>
> > >                 <exclusion>
> > >                     <groupId>org.apache.calcite</groupId>
> > >                     <artifactId>calcite-core</artifactId>
> > >                 </exclusion>
> > >             </exclusions>
> > >         </dependency>
> > >     </dependencies>
> > >
> > >
> > >     <build>
> > >         <resources>
> > >             <resource>
> > >                 <directory>src/main/resources</directory>
> > >                 <filtering>true</filtering>
> > >             </resource>
> > >             <resource>
> > >                 <directory>src/main/templates</directory>
> > >                 <filtering>true</filtering>
> > >             </resource>
> > >         </resources>
> > >
> > >         <plugins>
> > >             <plugin>
> > >                 <groupId>org.apache.maven.plugins</groupId>
> > >                 <artifactId>maven-assembly-plugin</artifactId>
> > >                 <version>2.4</version>
> > >                 <configuration>
> > >                     <descriptor>assembly-descriptor.xml</descriptor>
> > >                 </configuration>
> > >                 <executions>
> > >                     <execution>
> > >                         <phase>package</phase>
> > >                         <goals>
> > >                             <goal>single</goal>
> > >                         </goals>
> > >                     </execution>
> > >                 </executions>
> > >             </plugin>
> > >         </plugins>
> > >     </build>
> > >
> > > </project>
> > >
> > >
> > > We use mustache a lot to materialize coordinators, workflows and hql,
> > both
> > > buildtime and runtime. A nice thing we came up with is that some of our
> > > workflows have sub-workflows that gets materialized by the main
> workflow
> > at
> > > runtime. That way, different sub workflows may be generated dependent
> on
> > a
> > > runtime configuration. As an example we have sub workflows that load
> data
> > > from either to postgres or to kafka. These two ways of loading are
> > > described in two separate sub-workflow templates and are materialized
> at
> > > runtime dependent on if the user configured a postgres or a kafka load.
> > >
> > > This way we can build workflows that can be more reactive to the
> current
> > > state of the cluster and configuration.
> > >
> > > Here's an oozie snippet:
> > >
> > > <action name="materialize_load_sub_workflow" cred="hive_credentials">
> > >     <java>
> > >         <prepare>
> > >             <mkdir path="${wf:appPath()}/load/"/>
> > >         </prepare>
> > >         <main-class>com.klarna.datavault.load.materialize.
> > > MaterializeLoadSubWorkflowMain</main-class>
> > >         <!-- Output location for the materialized workflow XML -->
> > >         <arg>-o</arg>
> > >         <arg>${wf:appPath()}/load/load_sub_workflow-${wf:id()}.
> xml</arg>
> > >         <!-- Output location for the materialized kafka.properties
> > > file for a Kafka Load -->
> > >         <arg>-k</arg>
> > >         <arg>${wf:appPath()}/load/kafka.properties</arg>
> > >         <arg>-w</arg>
> > >         <arg>${workflowConfigPath}</arg>
> > >         <arg>-d</arg>
> > >         <arg>${TARGET_DB_LOCATION}</arg>
> > >         <arg>-s</arg>
> > >         <arg>${wf:appPath()}/load/hql-delta-transform-${wf:id()}.
> > hql</arg>
> > >         <arg>-n</arg>
> > >         <arg>${TARGET_DB_NAME}</arg>
> > >         <arg>-c</arg>
> > >         <arg>${LOAD_DIFF_DB_LOCATION}</arg>
> > >         <file>${HIVE_SITE_XML}</file>
> > >         <capture-output/>
> > >     </java>
> > >     <ok to="execute_load"/>
> > >     <error to="report_failure"/>
> > > </action>
> > >
> > >
> > > <action name="execute_load">
> > >     <sub-workflow>
> > >         <app-path>${wf:appPath()}/load/load_sub_workflow-${wf:
> > > id()}.xml</app-path>
> > >         <propagate-configuration/>
> > >         <configuration>
> > >             <property>
> > >                 <name>jobName</name>
> > >                 <value>${jobName}-load</value>
> > >             </property>
> > >             <property>
> > >                 <name>parentWorkflowAppPath</name>
> > >                 <value>${wf:appPath()}</value>
> > >             </property>
> > >             <property>
> > >                 <name>kafkaConfigFilePath</name>
> > >                 <value>${wf:appPath()}/load/kafka.properties</value>
> > >             </property>
> > >             <property>
> > >                 <name>hqlDeltaTransformationPath</name>
> > >                 <value>load/hql-delta-transform-${wf:id()}.hql</value>
> > >             </property>
> > >         </configuration>
> > >     </sub-workflow>
> > >     <ok to="generate_view_script"/>
> > >     <error to="report_failing_sub_wf"/>
> > > </action>
> > >
> > >
> > >
> > > Sorry if I'm blabbing
> > > /Pelle
> > >
> > > On Thu, Oct 20, 2016 at 3:52 AM, goun na <go...@gmail.com> wrote:
> > >
> > > > Per Ullberg, a snippet of pom.xml would help us. :)
> > > > Thanks,
> > > >
> > > > 2016-10-20 3:36 GMT+09:00 Per Ullberg <pe...@klarna.com>:
> > > >
> > > > > @goun na: we keep one coordinator per (zip|war|jar)
> > > > >
> > > > > @shiva: I'm happy to share, but it's hard to know what you're in
> need
> > > of.
> > > > > Ask and I will try to answer :)
> > > > >
> > > > > /Pelle
> > > > >
> > > > >
> > > > > On Wednesday, October 19, 2016, Shiva Ramagopal <tr.shiv@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Per,
> > > > > >
> > > > > > Your approach seems very interesting. Could you elaborate more on
> > > your
> > > > > > approach?
> > > > > >
> > > > > > Thanks,
> > > > > > Shiva
> > > > > >
> > > > > > On Wed, Oct 19, 2016 at 2:19 PM, Per Ullberg <
> > per.ullberg@klarna.com
> > > > > > <javascript:;>> wrote:
> > > > > >
> > > > > > > We package our oozie jobs with maven and release artifacts to
> > > nexus.
> > > > We
> > > > > > > keep the version number as part of the coordinator name. That
> way
> > > we
> > > > > have
> > > > > > > full traceability between code base and running coordinators.
> > > > > > >
> > > > > > > regards
> > > > > > > /Pelle
> > > > > > >
> > > > > > > On Wed, Oct 19, 2016 at 10:05 AM, Abhishek Bafna <
> > > > bafna.iitr@gmail.com
> > > > > > <javascript:;>>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > Oozie does not have version control for jobs. When you
> submit a
> > > > > > > > workflow/coordinator/bundle to oozie, it stores it into DB
> uses
> > > it
> > > > > from
> > > > > > > > there for further execution.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Abhishek
> > > > > > > > > On Oct 19, 2016, at 1:16 PM, goun na <gounna@gmail.com
> > > > > > <javascript:;>> wrote:
> > > > > > > > >
> > > > > > > > > Hi users,
> > > > > > > > >
> > > > > > > > > What is the best to manage Oozie jobs? Is there a built-in
> > > > version
> > > > > > > > control
> > > > > > > > > feature?
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Goun Na
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > *Per Ullberg*
> > > > > > > Data Vault Tech Lead
> > > > > > > Odin Uppsala
> > > > > > > +46 701612693 <+46+701612693>
> > > > > > >
> > > > > > > Klarna AB (publ)
> > > > > > > Sveavägen 46, 111 34 Stockholm
> > > > > > > Tel: +46 8 120 120 00 <+46812012000>
> > > > > > > Reg no: 556737-0431
> > > > > > > klarna.com
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > *Per Ullberg*
> > > > > Data Vault Tech Lead
> > > > > Odin Uppsala
> > > > > +46 701612693 <+46+701612693>
> > > > >
> > > > > Klarna AB (publ)
> > > > > Sveavägen 46, 111 34 Stockholm
> > > > > Tel: +46 8 120 120 00 <+46812012000>
> > > > > Reg no: 556737-0431
> > > > > klarna.com
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > *Per Ullberg*
> > > Data Vault Tech Lead
> > > Odin Uppsala
> > > +46 701612693 <+46+701612693>
> > >
> > > Klarna AB (publ)
> > > Sveavägen 46, 111 34 Stockholm
> > > Tel: +46 8 120 120 00 <+46812012000>
> > > Reg no: 556737-0431
> > > klarna.com
> > >
> >
>
>
>
> --
> Peter Cseh
> Software Engineer
> <http://www.cloudera.com>
>