You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Michael Latta <ml...@technomage.com> on 2018/04/07 00:04:43 UTC

Multi-stream question

I would like to “join” several streams (>3) in a custom operator. Is this feasible in Flink?


Michael

Re: Multi-stream question

Posted by Puneet Kinra <pu...@customercentria.com>.
Seeking for the same answer, you want connect multiple streams?


On Sat, Apr 7, 2018 at 5:34 AM, Michael Latta <ml...@technomage.com> wrote:

> I would like to “join” several streams (>3) in a custom operator. Is this
> feasible in Flink?
>
>
> Michael
>



-- 
*Cheers *

*Puneet Kinra*

*Mobile:+918800167808 | Skype : puneet.kinra@customercentria.com
<pu...@customercentria.com>*

*e-mail :puneet.kinra@customercentria.com
<pu...@customercentria.com>*

Re: Multi-stream question

Posted by Fabian Hueske <fh...@gmail.com>.
Hi,

Ken's approach of having a joint data type and unioning the streams is
good. This will work seamlessly with checkpoints. Timo (in CC) used the
same approach to implement a prototype of a multi-way join.

A Tuple won't work though because the Tuple serializer does not support
null fields. You can use a Row or implement a custom, Either-like type.

Best, Fabian


TechnoMage <ml...@technomage.com> schrieb am Sa., 7. Apr. 2018, 17:25:

> Thanks for the Tuple suggestion, I may use that.  I was asking about
> building a custom operator (just an idea).  I have since decided I can
> decompose the problem into pairs of streams and emit a stream to the next
> CoFlatMap to get the result I need.  Now to see if the idea works ...
>
> Michael
>
> > On Apr 7, 2018, at 1:10 PM, Ken Krugler <kk...@transpac.com>
> wrote:
> >
> > Hi Michael,
> >
> > There isn’t an operator that takes three (or more) streams, AFAIK.
> >
> > There is a CoFlatMapFunction that takes two different streams in, which
> could be used for some types of joins.
> >
> > Streaming joins are (typically) windowed (bounded), by
> time/count/something, so if you can maintain the required windowed state in
> a ProcessFunction then you can implement whatever custom logic is required
> for your join case.
> >
> > And for creating a unioned stream of multiple data types, one easy way
> is via (e.g.) Tuple3<POJO1, POJO2, POJO3>, where only one of the three
> fields is non-null for each tuple.
> >
> > -- Ken
> >
> > PS - I think the user@flink.apache.org <ma...@flink.apache.org>
> list is probably a better forum for this question.
> >
> >> On Apr 7, 2018, at 10:47 AM, TechnoMage <ml...@technomage.com> wrote:
> >>
> >> In my case I have more elaborate logic to select data from the
> streams.  They are not all the same logical type, though I may be able to
> represent them as the same Java type.  My main question is whether it is
> technically feasible to have a single operator that takes multiple streams
> as input.  For example Operator(stream1, stream2, stream3) and produces an
> output stream.  Can the checkpointing and other logic accomodate this if I
> write sufficient custom code in the operator?
> >>
> >> Michael
> >>
> >>> On Apr 7, 2018, at 10:42 AM, Ken Krugler <kk...@transpac.com>
> wrote:
> >>>
> >>> When you say “join” are you talking about a real join (so one or more
> fields can be used as a joining key), or some other operation?
> >>>
> >>> For more than two streams, you can do cascading window joins via
> multiple join()s that reduce your source streams down to a single stream.
> >>>
> >>> If the fields are the same across these streams, then a union()
> followed by say a ProcessFunction that implements your joining logic could
> work.
> >>>
> >>> Or you can convert all the streams to a common tuple format that
> consists of a unions the fields, so you can do a union() and then follow
> that with whatever logic is needed to actually do the join.
> >>>
> >>> Though I’m sure there are more elegant approaches :)
> >>>
> >>> — Ken
> >>>
> >>>
> >>>
> >>>> On Apr 6, 2018, at 5:04 PM, Michael Latta <ml...@technomage.com>
> wrote:
> >>>>
> >>>> I would like to “join” several streams (>3) in a custom operator. Is
> this feasible in Flink?
> >>>>
> >>>>
> >>>> Michael
> >>>
> >>> --------------------------------------------
> >>> http://about.me/kkrugler
> >>> +1 530-210-6378
> >>>
> >>
> >
> > --------------------------------------------
> > http://about.me/kkrugler
> > +1 530-210-6378
> >
>
>

Re: Multi-stream question

Posted by TechnoMage <ml...@technomage.com>.
Thanks for the Tuple suggestion, I may use that.  I was asking about building a custom operator (just an idea).  I have since decided I can decompose the problem into pairs of streams and emit a stream to the next CoFlatMap to get the result I need.  Now to see if the idea works ...

Michael

> On Apr 7, 2018, at 1:10 PM, Ken Krugler <kk...@transpac.com> wrote:
> 
> Hi Michael,
> 
> There isn’t an operator that takes three (or more) streams, AFAIK.
> 
> There is a CoFlatMapFunction that takes two different streams in, which could be used for some types of joins.
> 
> Streaming joins are (typically) windowed (bounded), by time/count/something, so if you can maintain the required windowed state in a ProcessFunction then you can implement whatever custom logic is required for your join case.
> 
> And for creating a unioned stream of multiple data types, one easy way is via (e.g.) Tuple3<POJO1, POJO2, POJO3>, where only one of the three fields is non-null for each tuple.
> 
> -- Ken
> 
> PS - I think the user@flink.apache.org <ma...@flink.apache.org> list is probably a better forum for this question.
> 
>> On Apr 7, 2018, at 10:47 AM, TechnoMage <ml...@technomage.com> wrote:
>> 
>> In my case I have more elaborate logic to select data from the streams.  They are not all the same logical type, though I may be able to represent them as the same Java type.  My main question is whether it is technically feasible to have a single operator that takes multiple streams as input.  For example Operator(stream1, stream2, stream3) and produces an output stream.  Can the checkpointing and other logic accomodate this if I write sufficient custom code in the operator?
>> 
>> Michael
>> 
>>> On Apr 7, 2018, at 10:42 AM, Ken Krugler <kk...@transpac.com> wrote:
>>> 
>>> When you say “join” are you talking about a real join (so one or more fields can be used as a joining key), or some other operation?
>>> 
>>> For more than two streams, you can do cascading window joins via multiple join()s that reduce your source streams down to a single stream.
>>> 
>>> If the fields are the same across these streams, then a union() followed by say a ProcessFunction that implements your joining logic could work.
>>> 
>>> Or you can convert all the streams to a common tuple format that consists of a unions the fields, so you can do a union() and then follow that with whatever logic is needed to actually do the join.
>>> 
>>> Though I’m sure there are more elegant approaches :)
>>> 
>>> — Ken
>>> 
>>> 
>>> 
>>>> On Apr 6, 2018, at 5:04 PM, Michael Latta <ml...@technomage.com> wrote:
>>>> 
>>>> I would like to “join” several streams (>3) in a custom operator. Is this feasible in Flink?
>>>> 
>>>> 
>>>> Michael
>>> 
>>> --------------------------------------------
>>> http://about.me/kkrugler
>>> +1 530-210-6378
>>> 
>> 
> 
> --------------------------------------------
> http://about.me/kkrugler
> +1 530-210-6378
> 


Re: Multi-stream question

Posted by Ken Krugler <kk...@transpac.com>.
Hi Michael,

There isn’t an operator that takes three (or more) streams, AFAIK.

There is a CoFlatMapFunction that takes two different streams in, which could be used for some types of joins.

Streaming joins are (typically) windowed (bounded), by time/count/something, so if you can maintain the required windowed state in a ProcessFunction then you can implement whatever custom logic is required for your join case.

And for creating a unioned stream of multiple data types, one easy way is via (e.g.) Tuple3<POJO1, POJO2, POJO3>, where only one of the three fields is non-null for each tuple.

-- Ken

PS - I think the user@flink.apache.org <ma...@flink.apache.org> list is probably a better forum for this question.

> On Apr 7, 2018, at 10:47 AM, TechnoMage <ml...@technomage.com> wrote:
> 
> In my case I have more elaborate logic to select data from the streams.  They are not all the same logical type, though I may be able to represent them as the same Java type.  My main question is whether it is technically feasible to have a single operator that takes multiple streams as input.  For example Operator(stream1, stream2, stream3) and produces an output stream.  Can the checkpointing and other logic accomodate this if I write sufficient custom code in the operator?
> 
> Michael
> 
>> On Apr 7, 2018, at 10:42 AM, Ken Krugler <kk...@transpac.com> wrote:
>> 
>> When you say “join” are you talking about a real join (so one or more fields can be used as a joining key), or some other operation?
>> 
>> For more than two streams, you can do cascading window joins via multiple join()s that reduce your source streams down to a single stream.
>> 
>> If the fields are the same across these streams, then a union() followed by say a ProcessFunction that implements your joining logic could work.
>> 
>> Or you can convert all the streams to a common tuple format that consists of a unions the fields, so you can do a union() and then follow that with whatever logic is needed to actually do the join.
>> 
>> Though I’m sure there are more elegant approaches :)
>> 
>> — Ken
>> 
>> 
>> 
>>> On Apr 6, 2018, at 5:04 PM, Michael Latta <ml...@technomage.com> wrote:
>>> 
>>> I would like to “join” several streams (>3) in a custom operator. Is this feasible in Flink?
>>> 
>>> 
>>> Michael
>> 
>> --------------------------------------------
>> http://about.me/kkrugler
>> +1 530-210-6378
>> 
> 

--------------------------------------------
http://about.me/kkrugler
+1 530-210-6378


Re: Multi-stream question

Posted by TechnoMage <ml...@technomage.com>.
In my case I have more elaborate logic to select data from the streams.  They are not all the same logical type, though I may be able to represent them as the same Java type.  My main question is whether it is technically feasible to have a single operator that takes multiple streams as input.  For example Operator(stream1, stream2, stream3) and produces an output stream.  Can the checkpointing and other logic accomodate this if I write sufficient custom code in the operator?

Michael

> On Apr 7, 2018, at 10:42 AM, Ken Krugler <kk...@transpac.com> wrote:
> 
> When you say “join” are you talking about a real join (so one or more fields can be used as a joining key), or some other operation?
> 
> For more than two streams, you can do cascading window joins via multiple join()s that reduce your source streams down to a single stream.
> 
> If the fields are the same across these streams, then a union() followed by say a ProcessFunction that implements your joining logic could work.
> 
> Or you can convert all the streams to a common tuple format that consists of a unions the fields, so you can do a union() and then follow that with whatever logic is needed to actually do the join.
> 
> Though I’m sure there are more elegant approaches :)
> 
> — Ken
> 
> 
> 
>> On Apr 6, 2018, at 5:04 PM, Michael Latta <ml...@technomage.com> wrote:
>> 
>> I would like to “join” several streams (>3) in a custom operator. Is this feasible in Flink?
>> 
>> 
>> Michael
> 
> --------------------------------------------
> http://about.me/kkrugler
> +1 530-210-6378
>