You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Esa Heikkinen <es...@student.tut.fi> on 2018/02/07 08:40:31 UTC

Flink CEP with files and no streams?

Hello

I am trying to use CEP of Flink for log files (as batch job), but not for streams (as realtime).
Is that possible ? If yes, do you know examples Scala codes about that ?

Or should I convert the log files (with time stamps) into streams ?
But how to handle time stamps in Flink ?

If I can not use Flink at all for this purpose, do you have any recommendations of other tools ?

I would want CEP type analysis for log files.



Re: Flink CEP with files and no streams?

Posted by Fabian Hueske <fh...@gmail.com>.
Hi,

I'm not aware of a good example but I can give you some pointers.

- Implement the SourceFunction interface. This function will not be
executed in parallel, so you don't have to worry about parallelism.
- Since you said, you want to run it as a batch job, you might not need to
implement checkpointing functionality
- In the run method, you open the file that you need to read. Start parsing
the file, when you have a record, extract the timestamp and emit both by
passing them to the SourceContext.
- Every n-th record, you can emit a watermark. The watermark timestamp must
be smaller than all record that will be emitted in the future.

I'd start processing a single file and extending the source from there.

Hope this helps,
Fabian

2018-02-07 13:59 GMT+01:00 Esa Heikkinen <es...@student.tut.fi>:

> Hi
>
>
>
> Thanks for the reply, but because I am a newbie with Flink, do you have
> any good Scala code examples about this ?
>
>
>
> Esa
>
>
>
> *From:* Fabian Hueske [mailto:fhueske@gmail.com]
> *Sent:* Wednesday, February 7, 2018 11:21 AM
> *To:* Esa Heikkinen <es...@student.tut.fi>
> *Cc:* user@flink.apache.org
> *Subject:* Re: Flink CEP with files and no streams?
>
>
>
> Hi Esa,
>
> you can also read files as a stream.
> However, you have to be careful in which order you read the files and how
> you generate watermarks.
>
> The easiest approach is to implement a non-parallel source function that
> reads the files in the right order and generates watermarks.
>
> Things become more tricky when you try to read the files in parallel.
>
> Best, Fabian
>
>
>
> 2018-02-07 9:40 GMT+01:00 Esa Heikkinen <es...@student.tut.fi>:
>
> Hello
>
>
>
> I am trying to use CEP of Flink for log files (as batch job), but not for
> streams (as realtime).
>
> Is that possible ? If yes, do you know examples Scala codes about that ?
>
>
>
> Or should I convert the log files (with time stamps) into streams ?
>
> But how to handle time stamps in Flink ?
>
>
>
> If I can not use Flink at all for this purpose, do you have any
> recommendations of other tools ?
>
>
>
> I would want CEP type analysis for log files.
>
>
>
>
>
>
>

RE: Flink CEP with files and no streams?

Posted by Esa Heikkinen <es...@student.tut.fi>.
Hi

Thanks for the reply, but because I am a newbie with Flink, do you have any good Scala code examples about this ?

Esa

From: Fabian Hueske [mailto:fhueske@gmail.com]
Sent: Wednesday, February 7, 2018 11:21 AM
To: Esa Heikkinen <es...@student.tut.fi>
Cc: user@flink.apache.org
Subject: Re: Flink CEP with files and no streams?

Hi Esa,
you can also read files as a stream.
However, you have to be careful in which order you read the files and how you generate watermarks.
The easiest approach is to implement a non-parallel source function that reads the files in the right order and generates watermarks.
Things become more tricky when you try to read the files in parallel.
Best, Fabian

2018-02-07 9:40 GMT+01:00 Esa Heikkinen <es...@student.tut.fi>>:
Hello

I am trying to use CEP of Flink for log files (as batch job), but not for streams (as realtime).
Is that possible ? If yes, do you know examples Scala codes about that ?

Or should I convert the log files (with time stamps) into streams ?
But how to handle time stamps in Flink ?

If I can not use Flink at all for this purpose, do you have any recommendations of other tools ?

I would want CEP type analysis for log files.




Re: Flink CEP with files and no streams?

Posted by Fabian Hueske <fh...@gmail.com>.
Hi Esa,

you can also read files as a stream.
However, you have to be careful in which order you read the files and how
you generate watermarks.
The easiest approach is to implement a non-parallel source function that
reads the files in the right order and generates watermarks.
Things become more tricky when you try to read the files in parallel.

Best, Fabian

2018-02-07 9:40 GMT+01:00 Esa Heikkinen <es...@student.tut.fi>:

> Hello
>
>
>
> I am trying to use CEP of Flink for log files (as batch job), but not for
> streams (as realtime).
>
> Is that possible ? If yes, do you know examples Scala codes about that ?
>
>
>
> Or should I convert the log files (with time stamps) into streams ?
>
> But how to handle time stamps in Flink ?
>
>
>
> If I can not use Flink at all for this purpose, do you have any
> recommendations of other tools ?
>
>
>
> I would want CEP type analysis for log files.
>
>
>
>
>