You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Charlie Moad <ch...@geofeedia.com> on 2016/11/17 21:36:04 UTC

Cross product of datastream and dataset

We're having trouble mapping our problem to Flink.

- For each incoming item
- Generate tuples of the item crossed with a data set
- Filter the tuples based on a condition
- Know the count of matching tuples

This seems to be mashup of DataStream and DataSet, but it appears you can't
operate with those together. We are also wondering if kicking off a batch
job for each incoming item is a feasible approach. We don't have time
window constraints.

Any recommendations would be greatly appreciated.

- Charlie

Re: Cross product of datastream and dataset

Posted by Fabian Hueske <fh...@gmail.com>.
Hi,

it is not possible to mix the DataSet and DataStream APIs at the moment.

If the DataSet is constant and not too big (which I assume, since otherwise
crossing would be extremely expensive), you can load the data into a
stateful MapFunction.
For that you can implement a RichFlatMapFunction and read the data in
open(). For each incoming record, i.e., each call of map(), you cross it
with the records in the state and immediately evaluate the condition and
count. That way you don't generate too many records.

If your DataSet is slowly changing, you can think of using a stateful
CoFlatmapFunction and use on input to read the stream and the other to
update the dataset.

Hope this helps,
Fabian


2016-11-17 22:36 GMT+01:00 Charlie Moad <ch...@geofeedia.com>:

> We're having trouble mapping our problem to Flink.
>
> - For each incoming item
> - Generate tuples of the item crossed with a data set
> - Filter the tuples based on a condition
> - Know the count of matching tuples
>
> This seems to be mashup of DataStream and DataSet, but it appears you
> can't operate with those together. We are also wondering if kicking off a
> batch job for each incoming item is a feasible approach. We don't have time
> window constraints.
>
> Any recommendations would be greatly appreciated.
>
> - Charlie
>
>