You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@oozie.apache.org by Something Something <ma...@gmail.com> on 2014/02/12 09:07:35 UTC
Sqoop action in Oozie
Let's say I've a table that looks something like this:
CREATE TABLE temperatures (id INT, created_time TIMESTAMP, value FLOAT);
with data like this:
INSERT INTO temperatures VALUES (1, '2014-01-01 00:00:00', 45.5);
INSERT INTO temperatures VALUES (2, '2014-01-01 00:10:00', 43.5);
INSERT INTO temperatures VALUES (3, '2014-01-01 01:00:00', 46.5);
INSERT INTO temperatures VALUES (4, '2014-01-01 01:10:00', 43.5);
INSERT INTO temperatures VALUES (5, '2014-01-01 02:00:00', 44.5);
INSERT INTO temperatures VALUES (6, '2014-01-01 02:10:00', 43.5);
& more...
I want to schedule a job in Oozie that will run every hour & will copy all
rows with 'created_time' within that hour. The output will go into
separate directory for each hour. For example, for above data, output
would go in:
/user/output/2014010100/
/user/output/2014010101/
/user/output/2014010200/
& so on.
How can I do this in Oozie? I am thinking the Sqoop command would be
something like this:
sqoop import --connect jdbc:mysql://localhost/test --username root
--target-dir /user/output/ --query "select * from temperatures where
created_time between {startime} and {endtime} \$CONDITIONS" --split-by
{dateHr}
Any pointers would be greatly appreciated. Thanks.