You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Kostas Kloudas (JIRA)" <ji...@apache.org> on 2017/10/05 12:49:00 UTC

[jira] [Commented] (FLINK-7606) CEP operator leaks state

    [ https://issues.apache.org/jira/browse/FLINK-7606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16192818#comment-16192818 ] 

Kostas Kloudas commented on FLINK-7606:
---------------------------------------

Hi [~info@paolorendano.it], 

Sorry for the late reply.

So, if I understand correctly, in a nutshell:
1) for event time, there is no memory leak problem (there is also a pending PR that probably fixes the problem also for processing time by unifying the code paths for both notions of time), but 
2) at the end of you input, the watermark does not advance and the last batch of events is not processed as it is waiting for the watermark <last message timestamp>+10 sec to trigger the computation, right?

In this case, if you know that your stream is finite, then you can close your source (call close() on your source)  and this will send a watermark Long.MAX_VALUE that will flush the buffered elements.

Kostas

> CEP operator leaks state
> ------------------------
>
>                 Key: FLINK-7606
>                 URL: https://issues.apache.org/jira/browse/FLINK-7606
>             Project: Flink
>          Issue Type: Bug
>          Components: CEP
>    Affects Versions: 1.3.1
>            Reporter: Matteo Ferrario
>         Attachments: heap-dump1.png, heap-dump2.png, heap-dump3.png, Schermata 2017-09-27 alle 00.35.53.png
>
>
> The NestedMapsStateTable grows up continuously without free the heap memory.
> We created a simple job that processes a stream of messages and uses CEP to generate an outcome message when a specific pattern is identified.
> The messages coming from the stream are grouped by a key defined in a specific field of the message.
> We've also added the "within" clause (set as 5 minutes), indicating that two incoming messages match the pattern only if they come in a certain time window.
> What we've seen is that for every key present in the message, an NFA object is instantiated in the NestedMapsStateTable and it is never deallocated.
> Also the "within" clause didn't help: we've seen that if we send messages that don't match the pattern, the memory grows up (I suppose that the state of NFA is updated) but it is not cleaned also after the 5 minutes of time window defined in "within" clause.
> If you need, I can provide more details about the job we've implemented and also the screenshots about the memory leak.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)