You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@oozie.apache.org by Something Something <ma...@gmail.com> on 2014/02/12 09:07:35 UTC

Sqoop action in Oozie

Let's say I've a table that looks something like this:

CREATE TABLE temperatures (id INT, created_time TIMESTAMP, value FLOAT);

with data like this:

INSERT INTO temperatures VALUES (1,   '2014-01-01 00:00:00', 45.5);
INSERT INTO temperatures VALUES (2,   '2014-01-01 00:10:00', 43.5);
INSERT INTO temperatures VALUES (3,   '2014-01-01 01:00:00', 46.5);
INSERT INTO temperatures VALUES (4,   '2014-01-01 01:10:00', 43.5);
INSERT INTO temperatures VALUES (5,   '2014-01-01 02:00:00', 44.5);
INSERT INTO temperatures VALUES (6,   '2014-01-01 02:10:00', 43.5);
& more...

I want to schedule a job in Oozie that will run every hour & will copy all
rows with 'created_time' within that hour.  The output will go into
separate directory for each hour.  For example, for above data, output
would go in:

/user/output/2014010100/
/user/output/2014010101/
/user/output/2014010200/
& so on.

How can I do this in Oozie?  I am thinking the Sqoop command would be
something like this:

sqoop import --connect jdbc:mysql://localhost/test --username root
--target-dir /user/output/ --query "select * from temperatures where
created_time between {startime} and {endtime} \$CONDITIONS" --split-by
{dateHr}

 Any pointers would be greatly appreciated.  Thanks.