You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Allen Wittenauer <aw...@apache.org> on 2011/03/25 17:46:57 UTC

Re: A way to monitor HDFS for a file to come live, and then kick off a job?

On Mar 24, 2011, at 10:09 AM, Jonathan Coveney wrote:

> I am not sure if this is the right listserv, forgive me if it is not.

	A better choice would likely be hdfs-user@, since this is really about watching files in HDFS.


> My
> goal is this: monitor HDFS until a file is create, and then kick off a job.
> Ideally I'd want to do this continuously, but the file would be create
> hourly (with some sort of variance). I guess I could make a script that
> would ping the server every 5 minutes or something, but I was wondering if
> there might be a more elegant way?

	Two ways off the top of my head:

	1) Read/watch the edits stream

	2) Read/watch the HDFS audit log

	Given the latter is text built by log4j, that should be relatively simple to implement.

There was a JIRA asking for this functionally to be built in recently, btw.

Re: A way to monitor HDFS for a file to come live, and then kick off a job?

Posted by Eric <er...@gmail.com>.
You can also use a FUSE mount and use a cronjob to check if new files
arrived. You may want to make sure to create a pid file that is checked so
you won't run the script again before the previous run finished.

2011/3/25 Allen Wittenauer <aw...@apache.org>

>
> On Mar 24, 2011, at 10:09 AM, Jonathan Coveney wrote:
>
> > I am not sure if this is the right listserv, forgive me if it is not.
>
>         A better choice would likely be hdfs-user@, since this is really
> about watching files in HDFS.
>
>
> > My
> > goal is this: monitor HDFS until a file is create, and then kick off a
> job.
> > Ideally I'd want to do this continuously, but the file would be create
> > hourly (with some sort of variance). I guess I could make a script that
> > would ping the server every 5 minutes or something, but I was wondering
> if
> > there might be a more elegant way?
>
>         Two ways off the top of my head:
>
>        1) Read/watch the edits stream
>
>        2) Read/watch the HDFS audit log
>
>        Given the latter is text built by log4j, that should be relatively
> simple to implement.
>
> There was a JIRA asking for this functionally to be built in recently, btw.

Re: A way to monitor HDFS for a file to come live, and then kick off a job?

Posted by Lance Norskog <go...@gmail.com>.
Hamake does exactly this:

http://code.google.com/p/hamake/

On Fri, Mar 25, 2011 at 9:46 AM, Allen Wittenauer <aw...@apache.org> wrote:
>
> On Mar 24, 2011, at 10:09 AM, Jonathan Coveney wrote:
>
>> I am not sure if this is the right listserv, forgive me if it is not.
>
>        A better choice would likely be hdfs-user@, since this is really about watching files in HDFS.
>
>
>> My
>> goal is this: monitor HDFS until a file is create, and then kick off a job.
>> Ideally I'd want to do this continuously, but the file would be create
>> hourly (with some sort of variance). I guess I could make a script that
>> would ping the server every 5 minutes or something, but I was wondering if
>> there might be a more elegant way?
>
>        Two ways off the top of my head:
>
>        1) Read/watch the edits stream
>
>        2) Read/watch the HDFS audit log
>
>        Given the latter is text built by log4j, that should be relatively simple to implement.
>
> There was a JIRA asking for this functionally to be built in recently, btw.



-- 
Lance Norskog
goksron@gmail.com