You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Allen Wittenauer <aw...@apache.org> on 2011/03/25 17:46:57 UTC
Re: A way to monitor HDFS for a file to come live, and then kick off a job?
On Mar 24, 2011, at 10:09 AM, Jonathan Coveney wrote:
> I am not sure if this is the right listserv, forgive me if it is not.
A better choice would likely be hdfs-user@, since this is really about watching files in HDFS.
> My
> goal is this: monitor HDFS until a file is create, and then kick off a job.
> Ideally I'd want to do this continuously, but the file would be create
> hourly (with some sort of variance). I guess I could make a script that
> would ping the server every 5 minutes or something, but I was wondering if
> there might be a more elegant way?
Two ways off the top of my head:
1) Read/watch the edits stream
2) Read/watch the HDFS audit log
Given the latter is text built by log4j, that should be relatively simple to implement.
There was a JIRA asking for this functionally to be built in recently, btw.
Re: A way to monitor HDFS for a file to come live, and then kick off
a job?
Posted by Eric <er...@gmail.com>.
You can also use a FUSE mount and use a cronjob to check if new files
arrived. You may want to make sure to create a pid file that is checked so
you won't run the script again before the previous run finished.
2011/3/25 Allen Wittenauer <aw...@apache.org>
>
> On Mar 24, 2011, at 10:09 AM, Jonathan Coveney wrote:
>
> > I am not sure if this is the right listserv, forgive me if it is not.
>
> A better choice would likely be hdfs-user@, since this is really
> about watching files in HDFS.
>
>
> > My
> > goal is this: monitor HDFS until a file is create, and then kick off a
> job.
> > Ideally I'd want to do this continuously, but the file would be create
> > hourly (with some sort of variance). I guess I could make a script that
> > would ping the server every 5 minutes or something, but I was wondering
> if
> > there might be a more elegant way?
>
> Two ways off the top of my head:
>
> 1) Read/watch the edits stream
>
> 2) Read/watch the HDFS audit log
>
> Given the latter is text built by log4j, that should be relatively
> simple to implement.
>
> There was a JIRA asking for this functionally to be built in recently, btw.
Re: A way to monitor HDFS for a file to come live, and then kick off
a job?
Posted by Lance Norskog <go...@gmail.com>.
Hamake does exactly this:
http://code.google.com/p/hamake/
On Fri, Mar 25, 2011 at 9:46 AM, Allen Wittenauer <aw...@apache.org> wrote:
>
> On Mar 24, 2011, at 10:09 AM, Jonathan Coveney wrote:
>
>> I am not sure if this is the right listserv, forgive me if it is not.
>
> A better choice would likely be hdfs-user@, since this is really about watching files in HDFS.
>
>
>> My
>> goal is this: monitor HDFS until a file is create, and then kick off a job.
>> Ideally I'd want to do this continuously, but the file would be create
>> hourly (with some sort of variance). I guess I could make a script that
>> would ping the server every 5 minutes or something, but I was wondering if
>> there might be a more elegant way?
>
> Two ways off the top of my head:
>
> 1) Read/watch the edits stream
>
> 2) Read/watch the HDFS audit log
>
> Given the latter is text built by log4j, that should be relatively simple to implement.
>
> There was a JIRA asking for this functionally to be built in recently, btw.
--
Lance Norskog
goksron@gmail.com