You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Shailesh Jain <sh...@stellapps.com> on 2018/03/02 06:37:25 UTC

Question on event time functionality, using Flink in a IoT usecase

Hi,

We're working with problems in IoT domain and using Flink to address
certain use cases (dominantly CEP). There are multiple devices (of the same
type, for eg. a temperature sensor) which are continuously pushing events.
These (N) devices are distinct and independent data sources, mostly
residing at different geographical locations. Clocks of all the devices are
based on network time (synced with NTP servers).

One of the pain points for us currently is, to create 'separate' data
streams per device, as opposed to a single keyed stream (keyed on the
device id), because there are external factors like network loss, device
reboots etc, which cause certain devices (and eventually their respective
data streams) to lag behind, and unfortunately it is not possible to
predict an upper bound on this lag.

We want to leverage the Event Time functionality provided by the framework,
and since per key watermarks are not supported, the number of data streams
(hence the number of duplicate CEP/Window/etc. operators) is scaling
linearly with the number of devices deployed. This is proving to be a major
bottleneck for us.

Questions:

1. What is the reason behind not supporting per key watermarks? -- i.e.
each operator can maintain N current time variables/timers etc as opposed
to a single 'clock' variable. One of the reasons I guess could be related
to "What happens if downstream datastreams and operators are not Keyed?".
Is this the only limitation?

2. Is there some fundamental aspect of the framework which we have missed?
It'll be really helpful if the community can point us to any existing case
studies which are similar to the use case mentioned above, to ensure we are
on the correct way forward.

Thanks,
Shailesh