You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by dawidwys <gi...@git.apache.org> on 2017/07/27 12:44:14 UTC

[GitHub] flink issue #4331: [FLINK-7169][CEP] Support AFTER MATCH SKIP function in CE...

Github user dawidwys commented on the issue:

    https://github.com/apache/flink/pull/4331
  
    Hi @yestinchen ,
    Thanks for the update.
    
    After second round of review. I found many problems with current approach. It returns only the first match in a stream in most cases.
    
    1. Let's analyze a pattern `A B C` with `SKIP_TO_FIRST C` and a sequence `a1 b1 c1 a2 b2 c2`. It will return only `a1 b1 c1` and will left the NFA without any valid `ComputationalStates` which results in stopping processing.
    
    2. Another problem is we do not handle a matches that can potentially finish before previously started. E.g. for Pattern 
    
    ```
    Pattern<Event, ?> pattern = Pattern.<Event>begin("ab").where(new SimpleCondition<Event>() {
    	@Override
    	public boolean filter(Event value) throws Exception {
    		return value.getName().equals("a") || value.getName().equals("b");
    	}
    }).followedBy("c").where(new IterativeCondition<Event>() {
    	@Override
    	public boolean filter(Event value, Context<Event> ctx) throws Exception {
    		return value.getName().equals("c") && ctx.getEventsForPattern("ab").iterator().next().getPrice() == value.getPrice();
    	}
    }).setAfterMatchSkipStrategy(new AfterMatchSkipStrategy(AfterMatchSkipStrategy.SkipStrategy.SKIP_PAST_LAST_EVENT));
    ```
    
    and a sequence `a(price = 1) b(price = 2) c(price = 2)`. I think a desired behaviour would be to start new matching after `c` event, but it won't as the matching started at `a` and will not start at `b`.
    
    ---
    
    Some general notes:
    
    1. I think the SQL's specification does not suits well into CEP's library as we do not operate on a partition/bounded collection of events. The specification on the other hand assumes such bounded data. I think we would benefit from some additional documentation how the AFTER_MATCH clause works in case of unbounded data. E.g. what does **_empty match_** mean:
    
    > Note that the AFTER MATCH SKIP syntax only determines the point to resume scanning for a match after a non-empty match. When an empty match is found, one row is skipped (as if SKIP TO NEXT ROW had been speci ed). Thus an empty match never causes one of these exceptions.
    
    etc.
    
    2. I really don't like the idea of so many cases when `RuntimeException` can be thrown. I feel the reason for using CEP is a constantly running jobs that search for patterns in a stream rather than ad-hoc queries. 
    E.g in case of a Pattern like `A B? C` with `SKIP_TO_LAST B` a sequence like `a c` results in an exception and the job being killed. In my opinion it does not suits well into constantly running job. From operational side running such Patterns would be at least interesting ;), as they depend so much on the arriving data.
    
    3. I don't know the reasoning, but Esper, that was mentioned as the other(besides Oracle) library that supports `MATCH_RECOGNIZE` clause does not support `AFTER MATCH` at all.
    
    4. I found out there was already an ongoing work to introduce part of the `AFTER MATCH` (the `SKIP_PAST_LAST`). The corresponding jira: https://issues.apache.org/jira/browse/FLINK-3703 and closed PR: #2367 .
    
    To sum up thanks @yestinchen for the work. Unfortunately I think the clause needs a little bit more conceptual discussion before we can introduce this change. I think the `SKIP_PAST_LAST` behaviour would be very helpful (in fact there were alread requests for it in the mailing list) and the most straight forward to implement. I would love to here your opinions @yestinchen as well as @kl0u and @dianfu.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---