You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Dominique De Vito <dd...@gmail.com> on 2020/01/25 17:00:07 UTC

is Flink supporting pre-loading of a compacted (reference) topic for a join ?

Hi,

I am looking for a "stream" join between:

* new data from a Kafka topic
* reference data from a compacted Kafka topic

Datastream.connect() works, but for a key, a join may occur before the
corresponding reference data has been read by Flink. And this is, of
course, a problem.

Is there a mean to load the compacted topic content before the join occurs,
before the join really starts ?

According to some previous archived emails, and also the "FLIP-17", this
feature is not implemented. Still, I am not sure about latest version
status, so my question.

If not implemented, is there a way to overcome the limitation, and achieve
the expected result (join) ?

Thanks.

Regards,
Dominique

Re: is Flink supporting pre-loading of a compacted (reference) topic for a join ?

Posted by Dominique De Vito <dd...@gmail.com>.

Hi Gordon,

Thanks for your reply / help.

Yes, following the savepoint road would certainly make the job, even it's
complicating the picture.

We might go that way in the future, but so far, we have followed an easier
one through eventual consistency:

* if some referential data is not (yet) loaded (as part of the join), then
do not join ;-)

(a) instead of doing a regular join (as expected), just set "not found" for
the missing property values and push (through the collector) the produced
data to the next tasks,
(b) store the incomplete data in some state, and when referential data has
come, then make the regular join, and push updated data  to the next taks
(through the collector).

So, it's all about eventual consistency.

Thanks.

Regards,
Dominique





Le mar. 28 janv. 2020 à 03:43, Tzu-Li Tai <tz...@gmail.com> a écrit :

> Hi Dominique,
>
> FLIP-17 (Side Inputs) is not yet implemented, AFAIK.
>
> One possible way to overcome this right now if your reference data is
> static
> and not continuously changing, is to use the State Processor API to
> bootstrap a savepoint with the reference data.
> Have you looked into that and see if it would work for you?
>
> Cheers,
> Gordon
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

Re: is Flink supporting pre-loading of a compacted (reference) topic for a join ?

Posted by Tzu-Li Tai <tz...@gmail.com>.

Hi Dominique,

FLIP-17 (Side Inputs) is not yet implemented, AFAIK.

One possible way to overcome this right now if your reference data is static
and not continuously changing, is to use the State Processor API to
bootstrap a savepoint with the reference data.
Have you looked into that and see if it would work for you?

Cheers,
Gordon



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/