You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by ph...@free.fr on 2016/03/07 15:53:01 UTC

Flink and Directory Monitors

Hello,

has anyone ever used Flink with file/directory monitoring applications such as Directory Monitor (https://directorymonitor.com/)?

Is it conceivable to process file-update events with Flink? For instance, let's says hundreds of users simultaneously modify files on a filesystem. Directory Monitor detects those modifications and send them as events/streams/or logs entries to Flink, which processes them to extract, say, the names of the files that have been modified the most, over a period of time, or the names of the biggest filesystem hogs (i.e., users who consume the most filesystem space).

Would Hadoop be needed between Directory Monitor and Flink, to store historical, filesystem-change data?

Many thanks.

Philippe


Re: Flink and Directory Monitors

Posted by Fabian Hueske <fh...@gmail.com>.
Hi Philippe,

I am not aware of anybody using Directory Monitor with Flink. However, the
application you described sounds reasonable and I think it should be
possible to implement that with Flink.

You would need to implement a SourceFunction that forwards events from DM
to Flink or you push the DM events into Kafka and use Flink's Kakfa
SourceFunction. Using Kafka has the benefit that fault tolerance and
exactly-once behavior are much easier to achieve because Kafka buffers
events for some time and Flink's Kafka source can replay the events if
necessary. If you implement a direct DM source for Flink, you would need to
implement the buffering yourself to achieve exactly-once or at-least-once
guarantees.

You do not need HDFS to communicate between DM and Flink, events can be
directly consumed without going through a filesystem. However, Flink
requires a persistent state backend to backup checkpoints for failure
recovery. This is usually HDFS but that component is pluggable.

Cheers, Fabian

2016-03-07 15:53 GMT+01:00 <ph...@free.fr>:

> Hello,
>
> has anyone ever used Flink with file/directory monitoring applications
> such as Directory Monitor (https://directorymonitor.com/)?
>
> Is it conceivable to process file-update events with Flink? For instance,
> let's says hundreds of users simultaneously modify files on a filesystem.
> Directory Monitor detects those modifications and send them as
> events/streams/or logs entries to Flink, which processes them to extract,
> say, the names of the files that have been modified the most, over a period
> of time, or the names of the biggest filesystem hogs (i.e., users who
> consume the most filesystem space).
>
> Would Hadoop be needed between Directory Monitor and Flink, to store
> historical, filesystem-change data?
>
> Many thanks.
>
> Philippe
>
>