You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Ayazur Rehman <re...@gmail.com> on 2015/06/30 22:45:19 UTC
schedule data ingestion to hive table using ftp
Maybe this is not exactly a question for hive user group, however I do not
know of any other better place.
So, I want to schedule data ingestion to hive from ftp. I have to schedule
a job to check for files that are getting generated and when they get
generated, move it to hdfs.
Can anyone suggest the best way to do it.
--
Thanking You,
Ayazur Rehman
Re: schedule data ingestion to hive table using ftp
Posted by Gopal Vijayaraghavan <go...@apache.org>.
Hi,
> So, I want to schedule data ingestion to hive from ftp. I have to
>schedule a job to check for files that are getting generated and when
>they get generated, move it to hdfs.
There is no ³best² way unfortunately.
The options start with Apache Oozie, the bog standard solution. Then
there¹s Falcon which uses Oozie to run things inside, but handles it
closer to hive¹s use-cases.
And there¹s the combination of Azkaban + Gobblin from Linkedin.
For those who prefer Python to Java, there¹s Luigi from Spotify.
If you¹re feeling really lazy, you can go through the NFS mount option in
HDFS, so that you can use regular cron to curl sftp -> nfsv3 into that.
The last option, is totally tied to unix cron, so it is not the best for
terabyte scale but it¹s the one that is the easiest to fix when it breaks.
Cheers,
Gopal