You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Ladhari Sadok <la...@gmail.com> on 2017/11/08 22:25:31 UTC

Broadcast to all the other operators

Hello,

I'm working on Rules Engine project with Flink 1.3, in this project I want
to update some keyed operator state when external event occurred.

I have a Datastream of updates (from kafka) I want to broadcast the data
contained in this stream to all keyed operator so I can change the state in
all operators.

It is like this use case :
Image :
https://data-artisans.com/wp-content/uploads/2017/10/streaming-in-definitions.png
All article :
https://data-artisans.com/blog/real-time-fraud-detection-ing-bank-apache-flink

I founded it in the DataSet API but not in the DataStream API !

https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/batch/index.html#broadcast-variables

Can some one explain to me who to solve this problem ?

Thanks a lot.

Flinkly regards,
Sadok

Re: Broadcast to all the other operators

Posted by Ladhari Sadok <la...@gmail.com>.
Ok thanks Tony, your answer is very helpful.

2017-11-09 11:09 GMT+01:00 Tony Wei <to...@gmail.com>:

> Hi Sadok,
>
> The sample code is just an example to show you how to broadcast the rules
> to all subtasks, but the output from CoFlatMap is not necessary to be
> Tuple2<Rule, Record>. It depends on what you actually need in your Rule
> Engine project.
> For example, if you can apply rule on each record directly, you can emit
> processed records to keyed operator.
> IMHO, the scenario in the article you mentioned is having serval
> well-prepared rules to enrich data, and using DSL files to decide what
> rules that incoming event needs. After enriching, the features for the
> particular event will be grouped by its random id and be calculated by the
> models.
> I think this approach might be close to the solution in that article, but
> it could have some difference according to different use cases.
>
> Best Regards,
> Tony Wei
>
>
> 2017-11-09 17:27 GMT+08:00 Ladhari Sadok <la...@gmail.com>:
>
>>
>> ---------- Forwarded message ----------
>> From: Ladhari Sadok <la...@gmail.com>
>> Date: 2017-11-09 10:26 GMT+01:00
>> Subject: Re: Broadcast to all the other operators
>> To: Tony Wei <to...@gmail.com>
>>
>>
>> Thanks Tony for your very fast answer ,
>>
>> Yes it resolves my problem that way, but with flatMap I will get
>> Tuple2<Rule, Record> always in the processing function (<NULL ,Record> in
>> case of no rules update available and <newRule,Record> in the other case ).
>> There is no optimization of this solution ? Do you think it is the same
>> solution in this picture : https://data-artisans.com/wp-c
>> ontent/uploads/2017/10/streaming-in-definitions.png ?
>>
>> Best regards,
>> Sadok
>>
>>
>> Le 9 nov. 2017 9:21 AM, "Tony Wei" <to...@gmail.com> a écrit :
>>
>> Hi Sadok,
>>
>> What I mean is to keep the rules in the operator state. The event in Rule
>> Stream is just the change log about rules.
>> For more specific, you can fetch the rules from Redis in the open step of
>> CoFlatMap and keep them in the operator state, then use Rule Stream to
>> notify the CoFlatMap to 1. update some rules or 2. refetch all rules from
>> Redis.
>> Is that what you want?
>>
>> Best Regards,
>> Tony Wei
>>
>> 2017-11-09 15:52 GMT+08:00 Ladhari Sadok <la...@gmail.com>:
>>
>>> Thank you for the answer, I know that solution, but I don't want to
>>> stream the rules all time.
>>> In my case I have the rules in Redis and at startup of flink they are
>>> loaded.
>>>
>>> I want to broadcast changes just when it occurs.
>>>
>>> Thanks.
>>>
>>> Le 9 nov. 2017 7:51 AM, "Tony Wei" <to...@gmail.com> a écrit :
>>>
>>>> Hi Sadok,
>>>>
>>>> Since you want to broadcast Rule Stream to all subtasks, it seems that
>>>> it is not necessary to use KeyedStream.
>>>> How about use broadcast partitioner, connect two streams to attach the
>>>> rule on each record or imply rule on them directly, and do the key operator
>>>> after that?
>>>> If you need to do key operator and apply the rules, it should work by
>>>> changing the order.
>>>>
>>>> The code might be something like this, and you can change the rules'
>>>> state in the CoFlatMapFunction.
>>>>
>>>> DataStream<Rule> rules = ...;
>>>> DataStream<Record> records = ...;
>>>> DataStream<Tuple2<Rule, Record>> recordWithRule =
>>>> rules.broadcast().connect(records).flatMap(...);
>>>> dataWithRule.keyBy(...).process(...);
>>>>
>>>> Hope this will make sense to you.
>>>>
>>>> Best Regards,
>>>> Tony Wei
>>>>
>>>> 2017-11-09 6:25 GMT+08:00 Ladhari Sadok <la...@gmail.com>:
>>>>
>>>>> Hello,
>>>>>
>>>>> I'm working on Rules Engine project with Flink 1.3, in this project I
>>>>> want to update some keyed operator state when external event occurred.
>>>>>
>>>>> I have a Datastream of updates (from kafka) I want to broadcast the
>>>>> data contained in this stream to all keyed operator so I can change the
>>>>> state in all operators.
>>>>>
>>>>> It is like this use case :
>>>>> Image : https://data-artisans.com/wp-content/uploads/2017/10/streami
>>>>> ng-in-definitions.png
>>>>> All article : https://data-artisans.com/blog
>>>>> /real-time-fraud-detection-ing-bank-apache-flink
>>>>>
>>>>> I founded it in the DataSet API but not in the DataStream API !
>>>>>
>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.3/
>>>>> dev/batch/index.html#broadcast-variables
>>>>>
>>>>> Can some one explain to me who to solve this problem ?
>>>>>
>>>>> Thanks a lot.
>>>>>
>>>>> Flinkly regards,
>>>>> Sadok
>>>>>
>>>>
>>>>
>>
>>
>>
>

Re: Broadcast to all the other operators

Posted by Tony Wei <to...@gmail.com>.
Hi Sadok,

The sample code is just an example to show you how to broadcast the rules
to all subtasks, but the output from CoFlatMap is not necessary to be
Tuple2<Rule, Record>. It depends on what you actually need in your Rule
Engine project.
For example, if you can apply rule on each record directly, you can emit
processed records to keyed operator.
IMHO, the scenario in the article you mentioned is having serval
well-prepared rules to enrich data, and using DSL files to decide what
rules that incoming event needs. After enriching, the features for the
particular event will be grouped by its random id and be calculated by the
models.
I think this approach might be close to the solution in that article, but
it could have some difference according to different use cases.

Best Regards,
Tony Wei


2017-11-09 17:27 GMT+08:00 Ladhari Sadok <la...@gmail.com>:

>
> ---------- Forwarded message ----------
> From: Ladhari Sadok <la...@gmail.com>
> Date: 2017-11-09 10:26 GMT+01:00
> Subject: Re: Broadcast to all the other operators
> To: Tony Wei <to...@gmail.com>
>
>
> Thanks Tony for your very fast answer ,
>
> Yes it resolves my problem that way, but with flatMap I will get
> Tuple2<Rule, Record> always in the processing function (<NULL ,Record> in
> case of no rules update available and <newRule,Record> in the other case ).
> There is no optimization of this solution ? Do you think it is the same
> solution in this picture : https://data-artisans.com/wp-c
> ontent/uploads/2017/10/streaming-in-definitions.png ?
>
> Best regards,
> Sadok
>
>
> Le 9 nov. 2017 9:21 AM, "Tony Wei" <to...@gmail.com> a écrit :
>
> Hi Sadok,
>
> What I mean is to keep the rules in the operator state. The event in Rule
> Stream is just the change log about rules.
> For more specific, you can fetch the rules from Redis in the open step of
> CoFlatMap and keep them in the operator state, then use Rule Stream to
> notify the CoFlatMap to 1. update some rules or 2. refetch all rules from
> Redis.
> Is that what you want?
>
> Best Regards,
> Tony Wei
>
> 2017-11-09 15:52 GMT+08:00 Ladhari Sadok <la...@gmail.com>:
>
>> Thank you for the answer, I know that solution, but I don't want to
>> stream the rules all time.
>> In my case I have the rules in Redis and at startup of flink they are
>> loaded.
>>
>> I want to broadcast changes just when it occurs.
>>
>> Thanks.
>>
>> Le 9 nov. 2017 7:51 AM, "Tony Wei" <to...@gmail.com> a écrit :
>>
>>> Hi Sadok,
>>>
>>> Since you want to broadcast Rule Stream to all subtasks, it seems that
>>> it is not necessary to use KeyedStream.
>>> How about use broadcast partitioner, connect two streams to attach the
>>> rule on each record or imply rule on them directly, and do the key operator
>>> after that?
>>> If you need to do key operator and apply the rules, it should work by
>>> changing the order.
>>>
>>> The code might be something like this, and you can change the rules'
>>> state in the CoFlatMapFunction.
>>>
>>> DataStream<Rule> rules = ...;
>>> DataStream<Record> records = ...;
>>> DataStream<Tuple2<Rule, Record>> recordWithRule =
>>> rules.broadcast().connect(records).flatMap(...);
>>> dataWithRule.keyBy(...).process(...);
>>>
>>> Hope this will make sense to you.
>>>
>>> Best Regards,
>>> Tony Wei
>>>
>>> 2017-11-09 6:25 GMT+08:00 Ladhari Sadok <la...@gmail.com>:
>>>
>>>> Hello,
>>>>
>>>> I'm working on Rules Engine project with Flink 1.3, in this project I
>>>> want to update some keyed operator state when external event occurred.
>>>>
>>>> I have a Datastream of updates (from kafka) I want to broadcast the
>>>> data contained in this stream to all keyed operator so I can change the
>>>> state in all operators.
>>>>
>>>> It is like this use case :
>>>> Image : https://data-artisans.com/wp-content/uploads/2017/10/streami
>>>> ng-in-definitions.png
>>>> All article : https://data-artisans.com/blog
>>>> /real-time-fraud-detection-ing-bank-apache-flink
>>>>
>>>> I founded it in the DataSet API but not in the DataStream API !
>>>>
>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.3/
>>>> dev/batch/index.html#broadcast-variables
>>>>
>>>> Can some one explain to me who to solve this problem ?
>>>>
>>>> Thanks a lot.
>>>>
>>>> Flinkly regards,
>>>> Sadok
>>>>
>>>
>>>
>
>
>

Fwd: Broadcast to all the other operators

Posted by Ladhari Sadok <la...@gmail.com>.
---------- Forwarded message ----------
From: Ladhari Sadok <la...@gmail.com>
Date: 2017-11-09 10:26 GMT+01:00
Subject: Re: Broadcast to all the other operators
To: Tony Wei <to...@gmail.com>


Thanks Tony for your very fast answer ,

Yes it resolves my problem that way, but with flatMap I will get
Tuple2<Rule, Record> always in the processing function (<NULL ,Record> in
case of no rules update available and <newRule,Record> in the other case ).
There is no optimization of this solution ? Do you think it is the same
solution in this picture : https://data-artisans.com/wp-c
ontent/uploads/2017/10/streaming-in-definitions.png ?

Best regards,
Sadok


Le 9 nov. 2017 9:21 AM, "Tony Wei" <to...@gmail.com> a écrit :

Hi Sadok,

What I mean is to keep the rules in the operator state. The event in Rule
Stream is just the change log about rules.
For more specific, you can fetch the rules from Redis in the open step of
CoFlatMap and keep them in the operator state, then use Rule Stream to
notify the CoFlatMap to 1. update some rules or 2. refetch all rules from
Redis.
Is that what you want?

Best Regards,
Tony Wei

2017-11-09 15:52 GMT+08:00 Ladhari Sadok <la...@gmail.com>:

> Thank you for the answer, I know that solution, but I don't want to stream
> the rules all time.
> In my case I have the rules in Redis and at startup of flink they are
> loaded.
>
> I want to broadcast changes just when it occurs.
>
> Thanks.
>
> Le 9 nov. 2017 7:51 AM, "Tony Wei" <to...@gmail.com> a écrit :
>
>> Hi Sadok,
>>
>> Since you want to broadcast Rule Stream to all subtasks, it seems that it
>> is not necessary to use KeyedStream.
>> How about use broadcast partitioner, connect two streams to attach the
>> rule on each record or imply rule on them directly, and do the key operator
>> after that?
>> If you need to do key operator and apply the rules, it should work by
>> changing the order.
>>
>> The code might be something like this, and you can change the rules'
>> state in the CoFlatMapFunction.
>>
>> DataStream<Rule> rules = ...;
>> DataStream<Record> records = ...;
>> DataStream<Tuple2<Rule, Record>> recordWithRule =
>> rules.broadcast().connect(records).flatMap(...);
>> dataWithRule.keyBy(...).process(...);
>>
>> Hope this will make sense to you.
>>
>> Best Regards,
>> Tony Wei
>>
>> 2017-11-09 6:25 GMT+08:00 Ladhari Sadok <la...@gmail.com>:
>>
>>> Hello,
>>>
>>> I'm working on Rules Engine project with Flink 1.3, in this project I
>>> want to update some keyed operator state when external event occurred.
>>>
>>> I have a Datastream of updates (from kafka) I want to broadcast the data
>>> contained in this stream to all keyed operator so I can change the state in
>>> all operators.
>>>
>>> It is like this use case :
>>> Image : https://data-artisans.com/wp-content/uploads/2017/10/streami
>>> ng-in-definitions.png
>>> All article : https://data-artisans.com/blog
>>> /real-time-fraud-detection-ing-bank-apache-flink
>>>
>>> I founded it in the DataSet API but not in the DataStream API !
>>>
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.3/
>>> dev/batch/index.html#broadcast-variables
>>>
>>> Can some one explain to me who to solve this problem ?
>>>
>>> Thanks a lot.
>>>
>>> Flinkly regards,
>>> Sadok
>>>
>>
>>

Re: Broadcast to all the other operators

Posted by Tony Wei <to...@gmail.com>.
Hi Sadok,

What I mean is to keep the rules in the operator state. The event in Rule
Stream is just the change log about rules.
For more specific, you can fetch the rules from Redis in the open step of
CoFlatMap and keep them in the operator state, then use Rule Stream to
notify the CoFlatMap to 1. update some rules or 2. refetch all rules from
Redis.
Is that what you want?

Best Regards,
Tony Wei

2017-11-09 15:52 GMT+08:00 Ladhari Sadok <la...@gmail.com>:

> Thank you for the answer, I know that solution, but I don't want to stream
> the rules all time.
> In my case I have the rules in Redis and at startup of flink they are
> loaded.
>
> I want to broadcast changes just when it occurs.
>
> Thanks.
>
> Le 9 nov. 2017 7:51 AM, "Tony Wei" <to...@gmail.com> a écrit :
>
>> Hi Sadok,
>>
>> Since you want to broadcast Rule Stream to all subtasks, it seems that it
>> is not necessary to use KeyedStream.
>> How about use broadcast partitioner, connect two streams to attach the
>> rule on each record or imply rule on them directly, and do the key operator
>> after that?
>> If you need to do key operator and apply the rules, it should work by
>> changing the order.
>>
>> The code might be something like this, and you can change the rules'
>> state in the CoFlatMapFunction.
>>
>> DataStream<Rule> rules = ...;
>> DataStream<Record> records = ...;
>> DataStream<Tuple2<Rule, Record>> recordWithRule =
>> rules.broadcast().connect(records).flatMap(...);
>> dataWithRule.keyBy(...).process(...);
>>
>> Hope this will make sense to you.
>>
>> Best Regards,
>> Tony Wei
>>
>> 2017-11-09 6:25 GMT+08:00 Ladhari Sadok <la...@gmail.com>:
>>
>>> Hello,
>>>
>>> I'm working on Rules Engine project with Flink 1.3, in this project I
>>> want to update some keyed operator state when external event occurred.
>>>
>>> I have a Datastream of updates (from kafka) I want to broadcast the data
>>> contained in this stream to all keyed operator so I can change the state in
>>> all operators.
>>>
>>> It is like this use case :
>>> Image : https://data-artisans.com/wp-content/uploads/2017/10/streami
>>> ng-in-definitions.png
>>> All article : https://data-artisans.com/blog
>>> /real-time-fraud-detection-ing-bank-apache-flink
>>>
>>> I founded it in the DataSet API but not in the DataStream API !
>>>
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.3/
>>> dev/batch/index.html#broadcast-variables
>>>
>>> Can some one explain to me who to solve this problem ?
>>>
>>> Thanks a lot.
>>>
>>> Flinkly regards,
>>> Sadok
>>>
>>
>>

Re: Broadcast to all the other operators

Posted by Ladhari Sadok <la...@gmail.com>.
Thank you for the answer, I know that solution, but I don't want to stream
the rules all time.
In my case I have the rules in Redis and at startup of flink they are
loaded.

I want to broadcast changes just when it occurs.

Thanks.

Le 9 nov. 2017 7:51 AM, "Tony Wei" <to...@gmail.com> a écrit :

> Hi Sadok,
>
> Since you want to broadcast Rule Stream to all subtasks, it seems that it
> is not necessary to use KeyedStream.
> How about use broadcast partitioner, connect two streams to attach the
> rule on each record or imply rule on them directly, and do the key operator
> after that?
> If you need to do key operator and apply the rules, it should work by
> changing the order.
>
> The code might be something like this, and you can change the rules' state
> in the CoFlatMapFunction.
>
> DataStream<Rule> rules = ...;
> DataStream<Record> records = ...;
> DataStream<Tuple2<Rule, Record>> recordWithRule =
> rules.broadcast().connect(records).flatMap(...);
> dataWithRule.keyBy(...).process(...);
>
> Hope this will make sense to you.
>
> Best Regards,
> Tony Wei
>
> 2017-11-09 6:25 GMT+08:00 Ladhari Sadok <la...@gmail.com>:
>
>> Hello,
>>
>> I'm working on Rules Engine project with Flink 1.3, in this project I
>> want to update some keyed operator state when external event occurred.
>>
>> I have a Datastream of updates (from kafka) I want to broadcast the data
>> contained in this stream to all keyed operator so I can change the state in
>> all operators.
>>
>> It is like this use case :
>> Image : https://data-artisans.com/wp-content/uploads/2017/10/streami
>> ng-in-definitions.png
>> All article : https://data-artisans.com/blog/real-time-fraud-detection-
>> ing-bank-apache-flink
>>
>> I founded it in the DataSet API but not in the DataStream API !
>>
>> https://ci.apache.org/projects/flink/flink-docs-release-1.3/
>> dev/batch/index.html#broadcast-variables
>>
>> Can some one explain to me who to solve this problem ?
>>
>> Thanks a lot.
>>
>> Flinkly regards,
>> Sadok
>>
>
>

Re: Broadcast to all the other operators

Posted by Tony Wei <to...@gmail.com>.
Hi Sadok,

Since you want to broadcast Rule Stream to all subtasks, it seems that it
is not necessary to use KeyedStream.
How about use broadcast partitioner, connect two streams to attach the rule
on each record or imply rule on them directly, and do the key operator
after that?
If you need to do key operator and apply the rules, it should work by
changing the order.

The code might be something like this, and you can change the rules' state
in the CoFlatMapFunction.

DataStream<Rule> rules = ...;
DataStream<Record> records = ...;
DataStream<Tuple2<Rule, Record>> recordWithRule =
rules.broadcast().connect(records).flatMap(...);
dataWithRule.keyBy(...).process(...);

Hope this will make sense to you.

Best Regards,
Tony Wei

2017-11-09 6:25 GMT+08:00 Ladhari Sadok <la...@gmail.com>:

> Hello,
>
> I'm working on Rules Engine project with Flink 1.3, in this project I want
> to update some keyed operator state when external event occurred.
>
> I have a Datastream of updates (from kafka) I want to broadcast the data
> contained in this stream to all keyed operator so I can change the state in
> all operators.
>
> It is like this use case :
> Image : https://data-artisans.com/wp-content/uploads/2017/10/
> streaming-in-definitions.png
> All article : https://data-artisans.com/blog/real-time-fraud-
> detection-ing-bank-apache-flink
>
> I founded it in the DataSet API but not in the DataStream API !
>
> https://ci.apache.org/projects/flink/flink-docs-
> release-1.3/dev/batch/index.html#broadcast-variables
>
> Can some one explain to me who to solve this problem ?
>
> Thanks a lot.
>
> Flinkly regards,
> Sadok
>