You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Dulce Morim <Du...@i2s.pt> on 2018/03/26 15:18:18 UTC

Keyby connect for a one to many relationship - DataStream API - Ride Enrichment (CoProcessFunction)

Hello,

Following this exercise:
http://training.data-artisans.com/exercises/rideEnrichment-processfunction.html

I need to do something similar, but my data structure is something like:

A
Primary_key
other fields

B
Primary_key
Relation_Key
other fields

Where A and B relationship is one to more, on B.Relation_key = A.Primary_key

When using the keyby function on both streams, with the key "A.Primary_key" on the A stream and the "B.Relation_key" on the B stream, the data that comes from B, only shows the last occurrence of the records that had the same "B.Relation_key".

Is it possible to connect these two streams? In this solution there seems to be a 1 to 1 relationship, but we want a one to many relationship. Should this be solved via another process?

Thanks,
Dulce Morim

Re: Keyby connect for a one to many relationship - DataStream API - Ride Enrichment (CoProcessFunction)

Posted by Chesnay Schepler <ch...@apache.org>.
You can still connect the streams but it will be more complex than the 
reference solution.

You will have to store the events from B in a ListState instead.
If an A arrives, store it in the value state, emit a tuple (A, B_x) for 
every stored B, and clear B.
 From that point on, emit a new tuple (A, B) for every B that arrives 
and ignore the B state.

On 26.03.2018 17:18, Dulce Morim wrote:
> Hello,
>
> Following this exercise:
> http://training.data-artisans.com/exercises/rideEnrichment-processfunction.html
>
> I need to do something similar, but my data structure is something like:
>
> A
> Primary_key
> other fields
>
> B
> Primary_key
> Relation_Key
> other fields
>
> Where A and B relationship is one to more, on B.Relation_key = A.Primary_key
>
> When using the keyby function on both streams, with the key "A.Primary_key" on the A stream and the "B.Relation_key" on the B stream, the data that comes from B, only shows the last occurrence of the records that had the same "B.Relation_key".
>
> Is it possible to connect these two streams? In this solution there seems to be a 1 to 1 relationship, but we want a one to many relationship. Should this be solved via another process?
>
> Thanks,
> Dulce Morim