You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@oozie.apache.org by Artem Ervits <ar...@nyp.org> on 2013/03/05 22:48:58 UTC

create a dir with timestamp for sqoop import

Hello all,

I am trying to do the following in Oozie workflow:

sqoop job --exec jobName -- -target-dir /test/$(date +%Y%m%d%H%M%S)

I can think of a couple of options, one is to create a sqoop action but passing $(date +%Y%m%d%H%M%S) as argument does not work. The other option I tried was create a shell action
TIMESTAMP=$(date +%Y%m%d%H%M%S) and then in sqoop outputdir do this:  ${outputDir}/$TIMESTAMP. Unfortunately that did not work either. Another option I tried was ${outputDir}/${timestamp()} but that format is not recognized by the OS. Is there a replace method for ${timestamp()} function? That would be the easiest implementation then.




--------------------

This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.




--------------------

This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.




RE: create a dir with timestamp for sqoop import

Posted by Artem Ervits <ar...@nyp.org>.
Thank you, I will look into that tomorrow.

-----Original Message-----
From: Paul Chavez [mailto:pchavez@verticalsearchworks.com] 
Sent: Tuesday, March 05, 2013 6:55 PM
To: user@oozie.apache.org
Subject: RE: create a dir with timestamp for sqoop import

I have a nightly Sqoop job that runs off a coordinator and uses a combination of output datasets and EL functions to provide the workflow parameters. Nice thing about that is you can 'backfill' with the coordinator by putting past days in the start/end attributes of the coordinator.

Here's a piece of my workflow, using a sqoop action:

<sqoop xmlns="uri:oozie:sqoop-action:0.2">
	<job-tracker>${jobTracker}</job-tracker>
	<name-node>${nameNode}</name-node>
	  <arg>import</arg>
	  <arg>--table</arg>
	  <arg>MyTable</arg>
	  <arg>--target-dir</arg>
	  <arg>${InputPath}</arg>
	  <arg>--where</arg>
	  <arg>"createddate between '${SDateMDY}' and '${SDateMDY} 23:59:59.997'"</arg> </sqoop> 

You can see I use two workflow properties, one for the path and one for the date format I want.

Then the coordinator:

<coordinator-app name="NightlySqoop_coord"
 frequency="${coord:days(1)}"
 start="${coord_start}" end="${coord_end}" timezone="America/Los_Angeles"
 xmlns="uri:oozie:coordinator:0.1">
 <datasets>
  <dataset name="ExportDaily" frequency="${coord:days(1)}"
   initial-instance="2013-01-01T11:00Z" timezone="America/Los_Angeles">
   <uri-template>${nameNode}/ExportDir/Export${YEAR}${MONTH}${DAY}</uri-template>
  </dataset>
 </datasets>
 <output-events>
  <data-out name="ExportPath" dataset="ExportDaily">
   <instance>${coord:current(-1)}</instance>
  </data-out>
 </output-events>
 <action>
  <workflow>
   ...
   <configuration>
    <property>
     <name>InputPath</name>
     <value>${coord:dataOut('ExportPath')}</value>
    </property>
    <property>
     <name>SDateMDY</name>
     <value>${coord:formatTime(coord:dateOffset(coord:nominalTime(), -1, 'DAY'), 'MM/dd/yyyy')}</value>
    </property>
   </configuration>
  </workflow>
 </action>
</coordinator-app>

The -1 offset for the properties are because the job triggers every morning at 3am local time but I want it to pull the previous day's data. The output dataset instance provides the path I need based on the uri-template, and the EL functions coord:formatTime(), coord:dateOffset() and cood:nominalTime() build the formatted date string.

Hope that helps.

Paul


-----Original Message-----
From: Artem Ervits [mailto:are9004@nyp.org]
Sent: Tuesday, March 05, 2013 1:49 PM
To: user@oozie.apache.org
Subject: create a dir with timestamp for sqoop import

Hello all,

I am trying to do the following in Oozie workflow:

sqoop job --exec jobName -- -target-dir /test/$(date +%Y%m%d%H%M%S)

I can think of a couple of options, one is to create a sqoop action but passing $(date +%Y%m%d%H%M%S) as argument does not work. The other option I tried was create a shell action TIMESTAMP=$(date +%Y%m%d%H%M%S) and then in sqoop outputdir do this:  ${outputDir}/$TIMESTAMP. Unfortunately that did not work either. Another option I tried was ${outputDir}/${timestamp()} but that format is not recognized by the OS. Is there a replace method for ${timestamp()} function? That would be the easiest implementation then.




--------------------

This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.




--------------------

This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.





--------------------

This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.




--------------------

This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.




RE: create a dir with timestamp for sqoop import

Posted by Paul Chavez <pc...@verticalsearchworks.com>.
I have a nightly Sqoop job that runs off a coordinator and uses a combination of output datasets and EL functions to provide the workflow parameters. Nice thing about that is you can 'backfill' with the coordinator by putting past days in the start/end attributes of the coordinator.

Here's a piece of my workflow, using a sqoop action:

<sqoop xmlns="uri:oozie:sqoop-action:0.2">
	<job-tracker>${jobTracker}</job-tracker>
	<name-node>${nameNode}</name-node>
	  <arg>import</arg>
	  <arg>--table</arg>
	  <arg>MyTable</arg>
	  <arg>--target-dir</arg>
	  <arg>${InputPath}</arg>
	  <arg>--where</arg>
	  <arg>"createddate between '${SDateMDY}' and '${SDateMDY} 23:59:59.997'"</arg>
</sqoop> 

You can see I use two workflow properties, one for the path and one for the date format I want.

Then the coordinator:

<coordinator-app name="NightlySqoop_coord"
 frequency="${coord:days(1)}"
 start="${coord_start}" end="${coord_end}" timezone="America/Los_Angeles"
 xmlns="uri:oozie:coordinator:0.1">
 <datasets>
  <dataset name="ExportDaily" frequency="${coord:days(1)}"
   initial-instance="2013-01-01T11:00Z" timezone="America/Los_Angeles">
   <uri-template>${nameNode}/ExportDir/Export${YEAR}${MONTH}${DAY}</uri-template>
  </dataset>
 </datasets>
 <output-events>
  <data-out name="ExportPath" dataset="ExportDaily">
   <instance>${coord:current(-1)}</instance>
  </data-out>
 </output-events>
 <action>
  <workflow>
   ...
   <configuration>
    <property>
     <name>InputPath</name>
     <value>${coord:dataOut('ExportPath')}</value>
    </property>
    <property>
     <name>SDateMDY</name>
     <value>${coord:formatTime(coord:dateOffset(coord:nominalTime(), -1, 'DAY'), 'MM/dd/yyyy')}</value>
    </property>
   </configuration>
  </workflow>
 </action>
</coordinator-app>

The -1 offset for the properties are because the job triggers every morning at 3am local time but I want it to pull the previous day's data. The output dataset instance provides the path I need based on the uri-template, and the EL functions coord:formatTime(), coord:dateOffset() and cood:nominalTime() build the formatted date string.

Hope that helps.

Paul


-----Original Message-----
From: Artem Ervits [mailto:are9004@nyp.org] 
Sent: Tuesday, March 05, 2013 1:49 PM
To: user@oozie.apache.org
Subject: create a dir with timestamp for sqoop import

Hello all,

I am trying to do the following in Oozie workflow:

sqoop job --exec jobName -- -target-dir /test/$(date +%Y%m%d%H%M%S)

I can think of a couple of options, one is to create a sqoop action but passing $(date +%Y%m%d%H%M%S) as argument does not work. The other option I tried was create a shell action TIMESTAMP=$(date +%Y%m%d%H%M%S) and then in sqoop outputdir do this:  ${outputDir}/$TIMESTAMP. Unfortunately that did not work either. Another option I tried was ${outputDir}/${timestamp()} but that format is not recognized by the OS. Is there a replace method for ${timestamp()} function? That would be the easiest implementation then.




--------------------

This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.




--------------------

This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.