You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by jaya sai <ja...@gmail.com> on 2019/07/25 22:48:27 UTC

question for handling db data

Hello,

I have a question on using flink, we have a small data set which does not
change often but have another data set which we need to compare with it and
it has lots of data

let say I have two collections geofence and locations in mongodb. Geofence
collection does not change often and relatively small, but we have location
data coming in at high amounts from clients and we need to calculate the
goefence entry exits based on geofence and location data point.
For calculating the entry and exit we were thinking of using flink CEP. But
our problem is sometimes geofence data changes and we need to update the in
memory store of the flink somehow

we were thinking of bootstrapping the memory of flink processor by loading
data on initial start and subscribe to kafaka topic to listen for geofence
changes and re-pull the data
Is this a valid approach ?

Thank you,

Re: question for handling db data

Posted by Oytun Tez <oy...@motaword.com>.
imagine an operator, ProcessFunction, it has 2 incoming data:
geofences via broadcast,
user location via normal data stream

geofence updates and user location updates will come separately into this
single operator.

1)
when geofence update comes via broadcast, the operator will update its
state with the new geofence rules. this.geofenceListState =
myNewGeofenceListState
this happens in processBroadcastElement() method.

whenever geofence data is updated, it will come to this operator, into
processBroadcastElement, and you will put the new geofence list into the
operator's state.

2)
when user location update comes to the operator, via regular stream, you
will access this.geofenceListState and do your calculations and collect()
whatever you need to collect at the end of computation.

regular stream comes, this time, to processElement() method.

-

geofence update will not affect the previously collected elements from
processElement. but Flink will make sure all of instances of this operator
in various task managers will get the same geofence update via
processBroadcastElement(), no matter whether the operator is keyed.

your processElement, meaning your user location updates, will do its
calculation via the latest geofence data from processBroadcastElement. when
geofence data is updated, user location updates from that point on will use
the new geofence data.

i hope this is more clear...






---
Oytun Tez

*M O T A W O R D*
The World's Fastest Human Translation Platform.
oytun@motaword.com — www.motaword.com


On Thu, Jul 25, 2019 at 7:08 PM jaya sai <ja...@gmail.com> wrote:

> Hi Oytun,
>
> Thanks for the quick reply, will study more
>
> so when we have a stream let say in kakfa for edit, delete and insert of
> geofences and add it to the flink broadcast downstream, what happens if the
> processing is taking place and we update the bounds of geofence ?
>
> when will the new data or how the updates take place and any impact on the
> geofence events out come based on the location data ?
>
> Thank you,
>
>
>
> On Thu, Jul 25, 2019 at 5:53 PM Oytun Tez <oy...@motaword.com> wrote:
>
>> Hi Jaya,
>>
>> Broadcast pattern may help here. Take a look at this:
>> https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/stream/state/broadcast_state.html
>>
>> You'll still keep your geofence data as a stream (depending on the data
>> and use case, maybe the whole list of geofence as a single stream item),
>> broadcast the stream to downstream operators, which will now have geofence
>> data in their state as their slow changing data (processBroadcastElement),
>> and the user location regularly coming to the operator (processElement).
>>
>>
>>
>>
>>
>> ---
>> Oytun Tez
>>
>> *M O T A W O R D*
>> The World's Fastest Human Translation Platform.
>> oytun@motaword.com — www.motaword.com
>>
>>
>> On Thu, Jul 25, 2019 at 6:48 PM jaya sai <ja...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I have a question on using flink, we have a small data set which does
>>> not change often but have another data set which we need to compare with it
>>> and it has lots of data
>>>
>>> let say I have two collections geofence and locations in mongodb.
>>> Geofence collection does not change often and relatively small, but we have
>>> location data coming in at high amounts from clients and we need to
>>> calculate the goefence entry exits based on geofence and location data
>>> point.
>>> For calculating the entry and exit we were thinking of using flink CEP.
>>> But our problem is sometimes geofence data changes and we need to update
>>> the in memory store of the flink somehow
>>>
>>> we were thinking of bootstrapping the memory of flink processor by
>>> loading data on initial start and subscribe to kafaka topic to listen for
>>> geofence changes and re-pull the data
>>> Is this a valid approach ?
>>>
>>> Thank you,
>>>
>>

Re: question for handling db data

Posted by Oytun Tez <oy...@motaword.com>.
Hi Jaya,

Broadcast pattern may help here. Take a look at this:
https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/stream/state/broadcast_state.html

You'll still keep your geofence data as a stream (depending on the data and
use case, maybe the whole list of geofence as a single stream item),
broadcast the stream to downstream operators, which will now have geofence
data in their state as their slow changing data (processBroadcastElement),
and the user location regularly coming to the operator (processElement).





---
Oytun Tez

*M O T A W O R D*
The World's Fastest Human Translation Platform.
oytun@motaword.com — www.motaword.com


On Thu, Jul 25, 2019 at 6:48 PM jaya sai <ja...@gmail.com> wrote:

> Hello,
>
> I have a question on using flink, we have a small data set which does not
> change often but have another data set which we need to compare with it and
> it has lots of data
>
> let say I have two collections geofence and locations in mongodb. Geofence
> collection does not change often and relatively small, but we have location
> data coming in at high amounts from clients and we need to calculate the
> goefence entry exits based on geofence and location data point.
> For calculating the entry and exit we were thinking of using flink CEP.
> But our problem is sometimes geofence data changes and we need to update
> the in memory store of the flink somehow
>
> we were thinking of bootstrapping the memory of flink processor by loading
> data on initial start and subscribe to kafaka topic to listen for geofence
> changes and re-pull the data
> Is this a valid approach ?
>
> Thank you,
>