You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Anton <ku...@gmail.com> on 2016/06/21 05:55:03 UTC

Advice - Drools in Flink

Hello

Firstly, I am an absolute Flink newbie.

I am interested in using Drools in Flink - in a similar case to what is
described in this blog, where Drools is used in Spark.
http://blog.cloudera.com/blog/2015/11/how-to-build-a-complex-event-processing-app-on-apache-spark-and-drools/

The basic idea is that Drools can be used to reason over streaming data.

The high-level use case is, there are several hundred users who want to
write rules to be notified on events related to changes in specific
streams. For example, notify me when a specific stock price changes by so
much.

Due to the number of users, the more end-user focused syntax of Drools, and
the number of rule changes, doing this in Drools makes more sense than to
write and deploy plain-old-flink (apologies, am not familiar with the
correct term for a flink process).

Also, as Drools has some powerful CEP operators, it could be very
interested to have these available in Flink too.

My question, therefore, is, very general - how best to integrate Drools
into Flink? Where should I start?

Thanks and regards
Anton

Re: Advice - Drools in Flink

Posted by Aljoscha Krettek <al...@apache.org>.
Hi,
you can also take a look at this:
https://techblog.king.com/rbea-scalable-real-time-analytics-king/. From a
high level it seems Drools implements something similar to the system that
they developed on top of Flink.

Cheers,
Aljoscha

On Tue, 21 Jun 2016 at 12:04 Robert Metzger <rm...@apache.org> wrote:

> Hi Anton,
>
> I think embedding Drools into Flink should be doable. Drools seems to be
> implemented in Java, so you can probably call the engine from Flink.
> I would start putting a flatMap() operator into a stream. In the operator,
> I would start the Drools engine (probably in the open() method) and then
> give the individual records to the engine.
>
> Regards,
> Robert
>
>
> On Tue, Jun 21, 2016 at 7:55 AM, Anton <ku...@gmail.com> wrote:
>
> > Hello
> >
> > Firstly, I am an absolute Flink newbie.
> >
> > I am interested in using Drools in Flink - in a similar case to what is
> > described in this blog, where Drools is used in Spark.
> >
> >
> http://blog.cloudera.com/blog/2015/11/how-to-build-a-complex-event-processing-app-on-apache-spark-and-drools/
> >
> > The basic idea is that Drools can be used to reason over streaming data.
> >
> > The high-level use case is, there are several hundred users who want to
> > write rules to be notified on events related to changes in specific
> > streams. For example, notify me when a specific stock price changes by so
> > much.
> >
> > Due to the number of users, the more end-user focused syntax of Drools,
> and
> > the number of rule changes, doing this in Drools makes more sense than to
> > write and deploy plain-old-flink (apologies, am not familiar with the
> > correct term for a flink process).
> >
> > Also, as Drools has some powerful CEP operators, it could be very
> > interested to have these available in Flink too.
> >
> > My question, therefore, is, very general - how best to integrate Drools
> > into Flink? Where should I start?
> >
> > Thanks and regards
> > Anton
> >
>

Re: Advice - Drools in Flink

Posted by Robert Metzger <rm...@apache.org>.
Hi Anton,

I think embedding Drools into Flink should be doable. Drools seems to be
implemented in Java, so you can probably call the engine from Flink.
I would start putting a flatMap() operator into a stream. In the operator,
I would start the Drools engine (probably in the open() method) and then
give the individual records to the engine.

Regards,
Robert


On Tue, Jun 21, 2016 at 7:55 AM, Anton <ku...@gmail.com> wrote:

> Hello
>
> Firstly, I am an absolute Flink newbie.
>
> I am interested in using Drools in Flink - in a similar case to what is
> described in this blog, where Drools is used in Spark.
>
> http://blog.cloudera.com/blog/2015/11/how-to-build-a-complex-event-processing-app-on-apache-spark-and-drools/
>
> The basic idea is that Drools can be used to reason over streaming data.
>
> The high-level use case is, there are several hundred users who want to
> write rules to be notified on events related to changes in specific
> streams. For example, notify me when a specific stock price changes by so
> much.
>
> Due to the number of users, the more end-user focused syntax of Drools, and
> the number of rule changes, doing this in Drools makes more sense than to
> write and deploy plain-old-flink (apologies, am not familiar with the
> correct term for a flink process).
>
> Also, as Drools has some powerful CEP operators, it could be very
> interested to have these available in Flink too.
>
> My question, therefore, is, very general - how best to integrate Drools
> into Flink? Where should I start?
>
> Thanks and regards
> Anton
>

Re: Advice - Drools in Flink

Posted by Maciek Próchniak <mp...@touk.pl>.
Hi Anton,

I think I start to understand what you want to achieve. I would be a 
very interesting thing to do ;)
If the drools are the piece that decides what is interesting and what is 
not, then probably also drools should be responisble for keeping 
state/aggregates - otherwise
Flink would keep too much stuff in memory.

I think that serializing session wouldn't be impossible. I still wonder 
how to deal with clocks in Drools CEP - maybe the clock should be moved 
by incoming events - but then there are problems with out of order ones, 
or only by watermarks - but then you have worse latency.
One thing that is still not so clear to me is - how do you want to 
partition the stream of tweets? I guess that it cannot be done by some 
tweet property - e.g. author, because otherwise users would have many 
sessions and aggregations would not be correct?
I may be wrong, but I think that to leverage Flink scaling/state 
capabilities the processing logic should be somehow Flink-aware - that 
is, Flink should understand (at least partially) what are specific rules...


maciek


On 23/06/2016 16:54, Anton wrote:
> Hi Maciek
>
> Firstly, thanks for your replies and interest. It is really appreciated. I
> am familiar with Drools but am new to Flink. Whereas you seem familiar with
> both - so your feedback is really appreciated.
>
> On Thu, Jun 23, 2016 at 8:30 AM, Maciek Pr�chniak <mp...@touk.pl> wrote:
>
>> you mean - keeping working memory facts in Flink state and with each event
>> throw them into stateless session?
>>
> Yes kind of.
>
>> Can you elaborate a bit on your use case? What would be the state?
>>
> *Use case*
> Flink ingests a large amount of tweets.
> Application users can write rules to reason of tweet contents and
> frequency.
> For example, user John writes a rule that says:
>
> *When I receive 3 tweets about Apache Flink within 1 day*
>
> *Then Flink is popular*
>
> *Fire new Popular("Flink") event*
>
>
>
> *When I have not received a tweet about Apache Attic project ___ in one
> month*
>
> *then this project is not popular*
> *Fire new UnPopular("Attic project") event*
>
>
> Another important aspect of the use-case is the frequency of rule
> changes/updates. There will be many users writing rules to reason over
> incoming events - and these rules are expected to change often.
>
> The main goal is to make Drools CEP stateful. Part of the challenge is
> ensuring that new messages are routed to the correct drools session. But,
> as Flink already seems to have scaling and state well done - it would be
> nice to combine the two.
> Please have a look at
> http://blog.athico.com/2016/03/high-availability-drools-stateless.html
>
> Also, I cannot comment on the strengths and weaknesses of Flink CEP, but
> Drools CEP is certainly very mature - and when combined with the rule
> language - very powerful. To be able to combine the statefulness of Flink
> with Drools would be amazing!
>
> The concern with using Stateful sessions is the possibility of memory
> leaks. However, a stateless session is just a wrapper around stateless -
> which calls dispose. So this makes me think, perhaps, when populating the
> working memory, just extract all relevant events from the Flink state into
> the Drools WM - and it could even be stateful - and then call displose
> after calling fireAllRules. And do this each time a new message/event
> arrives.
>
> Would it be for example some event aggregations, which would be filtered by
>> the rules?
> Yes.
>


Re: Advice - Drools in Flink

Posted by Anton <ku...@gmail.com>.
Hi Maciek

Firstly, thanks for your replies and interest. It is really appreciated. I
am familiar with Drools but am new to Flink. Whereas you seem familiar with
both - so your feedback is really appreciated.

On Thu, Jun 23, 2016 at 8:30 AM, Maciek Próchniak <mp...@touk.pl> wrote:

>
> you mean - keeping working memory facts in Flink state and with each event
> throw them into stateless session?
>
Yes kind of.

> Can you elaborate a bit on your use case? What would be the state?
>

*Use case*
Flink ingests a large amount of tweets.
Application users can write rules to reason of tweet contents and
frequency.
For example, user John writes a rule that says:

*When I receive 3 tweets about Apache Flink within 1 day*

*Then Flink is popular*

*Fire new Popular("Flink") event*



*When I have not received a tweet about Apache Attic project ___ in one
month*

*then this project is not popular*
*Fire new UnPopular("Attic project") event*


Another important aspect of the use-case is the frequency of rule
changes/updates. There will be many users writing rules to reason over
incoming events - and these rules are expected to change often.

The main goal is to make Drools CEP stateful. Part of the challenge is
ensuring that new messages are routed to the correct drools session. But,
as Flink already seems to have scaling and state well done - it would be
nice to combine the two.
Please have a look at
http://blog.athico.com/2016/03/high-availability-drools-stateless.html

Also, I cannot comment on the strengths and weaknesses of Flink CEP, but
Drools CEP is certainly very mature - and when combined with the rule
language - very powerful. To be able to combine the statefulness of Flink
with Drools would be amazing!

The concern with using Stateful sessions is the possibility of memory
leaks. However, a stateless session is just a wrapper around stateless -
which calls dispose. So this makes me think, perhaps, when populating the
working memory, just extract all relevant events from the Flink state into
the Drools WM - and it could even be stateful - and then call displose
after calling fireAllRules. And do this each time a new message/event
arrives.

Would it be for example some event aggregations, which would be filtered by
> the rules?

Yes.

Re: Advice - Drools in Flink

Posted by Maciek Próchniak <mp...@touk.pl>.
Hi Anton,

you mean - keeping working memory facts in Flink state and with each 
event throw them into stateless session?
Can you elaborate a bit on your use case? What would be the state?
Would it be for example some event aggregations, which would be filtered 
by the rules?

maciek

On 22/06/2016 21:19, Anton wrote:
> Thanks Maiek
> On Wed, Jun 22, 2016 at 9:00 PM, Maciek Pr�chniak <mp...@touk.pl> wrote:
>
>> This is straightforward, it also shouldn't be problematic performance wise
>> OTOH, if you want to use stateful sessions things are getting more
>> complicated - because if you want to play well with Flink you'd have to
>> integrate drools sessions with checkpoints in Flink - and that can be a bit
>> more tricky (although certainly possible)
>>
> Yes, this is exactly the point I wanted to discuss. And I know Drools
> stateful can be tricky even without using Flink.
>
> However, I was thinking - and am interested to know if this is possible -
> if it would be possible to do a puedo stateful drools but passing the state
> from flink into drools stateless. Does that make sense? And would it be
> possible?
>
> Thanks
>


Re: Advice - Drools in Flink

Posted by Anton <ku...@gmail.com>.
Thanks Maiek
On Wed, Jun 22, 2016 at 9:00 PM, Maciek Próchniak <mp...@touk.pl> wrote:

> This is straightforward, it also shouldn't be problematic performance wise
> OTOH, if you want to use stateful sessions things are getting more
> complicated - because if you want to play well with Flink you'd have to
> integrate drools sessions with checkpoints in Flink - and that can be a bit
> more tricky (although certainly possible)
>

Yes, this is exactly the point I wanted to discuss. And I know Drools
stateful can be tricky even without using Flink.

However, I was thinking - and am interested to know if this is possible -
if it would be possible to do a puedo stateful drools but passing the state
from flink into drools stateless. Does that make sense? And would it be
possible?

Thanks