You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Ayazur Rehman <re...@gmail.com> on 2015/06/30 22:45:19 UTC

schedule data ingestion to hive table using ftp

Maybe this is not exactly a question for hive user group, however I do not
know of any other better place.

So, I want to schedule data ingestion to hive from ftp. I have to schedule
a job to check for files that are getting generated and when they get
generated, move it to hdfs.

Can anyone suggest the best way to do it.

-- 
Thanking You,
Ayazur Rehman

Re: schedule data ingestion to hive table using ftp

Posted by Gopal Vijayaraghavan <go...@apache.org>.
Hi,

> So, I want to schedule data ingestion to hive from ftp. I have to
>schedule a job to check for files that are getting generated and when
>they get generated, move it to hdfs.

There is no ³best² way unfortunately.

The options start with Apache Oozie, the bog standard solution. Then
there¹s Falcon which uses Oozie to run things inside, but handles it
closer to hive¹s use-cases.

And there¹s the combination of Azkaban + Gobblin from Linkedin.


For those who prefer Python to Java, there¹s Luigi from Spotify.

If you¹re feeling really lazy, you can go through the NFS mount option in
HDFS, so that you can use regular cron to curl sftp -> nfsv3 into that.

The last option, is totally tied to unix cron, so it is not the best for
terabyte scale but it¹s the one that is the easiest to fix when it breaks.

Cheers,
Gopal