You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Michael Latta <ml...@technomage.com> on 2018/04/07 00:04:43 UTC
Multi-stream question
I would like to “join” several streams (>3) in a custom operator. Is this feasible in Flink?
Michael
Re: Multi-stream question
Posted by Puneet Kinra <pu...@customercentria.com>.
Seeking for the same answer, you want connect multiple streams?
On Sat, Apr 7, 2018 at 5:34 AM, Michael Latta <ml...@technomage.com> wrote:
> I would like to “join” several streams (>3) in a custom operator. Is this
> feasible in Flink?
>
>
> Michael
>
--
*Cheers *
*Puneet Kinra*
*Mobile:+918800167808 | Skype : puneet.kinra@customercentria.com
<pu...@customercentria.com>*
*e-mail :puneet.kinra@customercentria.com
<pu...@customercentria.com>*
Re: Multi-stream question
Posted by Fabian Hueske <fh...@gmail.com>.
Hi,
Ken's approach of having a joint data type and unioning the streams is
good. This will work seamlessly with checkpoints. Timo (in CC) used the
same approach to implement a prototype of a multi-way join.
A Tuple won't work though because the Tuple serializer does not support
null fields. You can use a Row or implement a custom, Either-like type.
Best, Fabian
TechnoMage <ml...@technomage.com> schrieb am Sa., 7. Apr. 2018, 17:25:
> Thanks for the Tuple suggestion, I may use that. I was asking about
> building a custom operator (just an idea). I have since decided I can
> decompose the problem into pairs of streams and emit a stream to the next
> CoFlatMap to get the result I need. Now to see if the idea works ...
>
> Michael
>
> > On Apr 7, 2018, at 1:10 PM, Ken Krugler <kk...@transpac.com>
> wrote:
> >
> > Hi Michael,
> >
> > There isn’t an operator that takes three (or more) streams, AFAIK.
> >
> > There is a CoFlatMapFunction that takes two different streams in, which
> could be used for some types of joins.
> >
> > Streaming joins are (typically) windowed (bounded), by
> time/count/something, so if you can maintain the required windowed state in
> a ProcessFunction then you can implement whatever custom logic is required
> for your join case.
> >
> > And for creating a unioned stream of multiple data types, one easy way
> is via (e.g.) Tuple3<POJO1, POJO2, POJO3>, where only one of the three
> fields is non-null for each tuple.
> >
> > -- Ken
> >
> > PS - I think the user@flink.apache.org <ma...@flink.apache.org>
> list is probably a better forum for this question.
> >
> >> On Apr 7, 2018, at 10:47 AM, TechnoMage <ml...@technomage.com> wrote:
> >>
> >> In my case I have more elaborate logic to select data from the
> streams. They are not all the same logical type, though I may be able to
> represent them as the same Java type. My main question is whether it is
> technically feasible to have a single operator that takes multiple streams
> as input. For example Operator(stream1, stream2, stream3) and produces an
> output stream. Can the checkpointing and other logic accomodate this if I
> write sufficient custom code in the operator?
> >>
> >> Michael
> >>
> >>> On Apr 7, 2018, at 10:42 AM, Ken Krugler <kk...@transpac.com>
> wrote:
> >>>
> >>> When you say “join” are you talking about a real join (so one or more
> fields can be used as a joining key), or some other operation?
> >>>
> >>> For more than two streams, you can do cascading window joins via
> multiple join()s that reduce your source streams down to a single stream.
> >>>
> >>> If the fields are the same across these streams, then a union()
> followed by say a ProcessFunction that implements your joining logic could
> work.
> >>>
> >>> Or you can convert all the streams to a common tuple format that
> consists of a unions the fields, so you can do a union() and then follow
> that with whatever logic is needed to actually do the join.
> >>>
> >>> Though I’m sure there are more elegant approaches :)
> >>>
> >>> — Ken
> >>>
> >>>
> >>>
> >>>> On Apr 6, 2018, at 5:04 PM, Michael Latta <ml...@technomage.com>
> wrote:
> >>>>
> >>>> I would like to “join” several streams (>3) in a custom operator. Is
> this feasible in Flink?
> >>>>
> >>>>
> >>>> Michael
> >>>
> >>> --------------------------------------------
> >>> http://about.me/kkrugler
> >>> +1 530-210-6378
> >>>
> >>
> >
> > --------------------------------------------
> > http://about.me/kkrugler
> > +1 530-210-6378
> >
>
>
Re: Multi-stream question
Posted by TechnoMage <ml...@technomage.com>.
Thanks for the Tuple suggestion, I may use that. I was asking about building a custom operator (just an idea). I have since decided I can decompose the problem into pairs of streams and emit a stream to the next CoFlatMap to get the result I need. Now to see if the idea works ...
Michael
> On Apr 7, 2018, at 1:10 PM, Ken Krugler <kk...@transpac.com> wrote:
>
> Hi Michael,
>
> There isn’t an operator that takes three (or more) streams, AFAIK.
>
> There is a CoFlatMapFunction that takes two different streams in, which could be used for some types of joins.
>
> Streaming joins are (typically) windowed (bounded), by time/count/something, so if you can maintain the required windowed state in a ProcessFunction then you can implement whatever custom logic is required for your join case.
>
> And for creating a unioned stream of multiple data types, one easy way is via (e.g.) Tuple3<POJO1, POJO2, POJO3>, where only one of the three fields is non-null for each tuple.
>
> -- Ken
>
> PS - I think the user@flink.apache.org <ma...@flink.apache.org> list is probably a better forum for this question.
>
>> On Apr 7, 2018, at 10:47 AM, TechnoMage <ml...@technomage.com> wrote:
>>
>> In my case I have more elaborate logic to select data from the streams. They are not all the same logical type, though I may be able to represent them as the same Java type. My main question is whether it is technically feasible to have a single operator that takes multiple streams as input. For example Operator(stream1, stream2, stream3) and produces an output stream. Can the checkpointing and other logic accomodate this if I write sufficient custom code in the operator?
>>
>> Michael
>>
>>> On Apr 7, 2018, at 10:42 AM, Ken Krugler <kk...@transpac.com> wrote:
>>>
>>> When you say “join” are you talking about a real join (so one or more fields can be used as a joining key), or some other operation?
>>>
>>> For more than two streams, you can do cascading window joins via multiple join()s that reduce your source streams down to a single stream.
>>>
>>> If the fields are the same across these streams, then a union() followed by say a ProcessFunction that implements your joining logic could work.
>>>
>>> Or you can convert all the streams to a common tuple format that consists of a unions the fields, so you can do a union() and then follow that with whatever logic is needed to actually do the join.
>>>
>>> Though I’m sure there are more elegant approaches :)
>>>
>>> — Ken
>>>
>>>
>>>
>>>> On Apr 6, 2018, at 5:04 PM, Michael Latta <ml...@technomage.com> wrote:
>>>>
>>>> I would like to “join” several streams (>3) in a custom operator. Is this feasible in Flink?
>>>>
>>>>
>>>> Michael
>>>
>>> --------------------------------------------
>>> http://about.me/kkrugler
>>> +1 530-210-6378
>>>
>>
>
> --------------------------------------------
> http://about.me/kkrugler
> +1 530-210-6378
>
Re: Multi-stream question
Posted by Ken Krugler <kk...@transpac.com>.
Hi Michael,
There isn’t an operator that takes three (or more) streams, AFAIK.
There is a CoFlatMapFunction that takes two different streams in, which could be used for some types of joins.
Streaming joins are (typically) windowed (bounded), by time/count/something, so if you can maintain the required windowed state in a ProcessFunction then you can implement whatever custom logic is required for your join case.
And for creating a unioned stream of multiple data types, one easy way is via (e.g.) Tuple3<POJO1, POJO2, POJO3>, where only one of the three fields is non-null for each tuple.
-- Ken
PS - I think the user@flink.apache.org <ma...@flink.apache.org> list is probably a better forum for this question.
> On Apr 7, 2018, at 10:47 AM, TechnoMage <ml...@technomage.com> wrote:
>
> In my case I have more elaborate logic to select data from the streams. They are not all the same logical type, though I may be able to represent them as the same Java type. My main question is whether it is technically feasible to have a single operator that takes multiple streams as input. For example Operator(stream1, stream2, stream3) and produces an output stream. Can the checkpointing and other logic accomodate this if I write sufficient custom code in the operator?
>
> Michael
>
>> On Apr 7, 2018, at 10:42 AM, Ken Krugler <kk...@transpac.com> wrote:
>>
>> When you say “join” are you talking about a real join (so one or more fields can be used as a joining key), or some other operation?
>>
>> For more than two streams, you can do cascading window joins via multiple join()s that reduce your source streams down to a single stream.
>>
>> If the fields are the same across these streams, then a union() followed by say a ProcessFunction that implements your joining logic could work.
>>
>> Or you can convert all the streams to a common tuple format that consists of a unions the fields, so you can do a union() and then follow that with whatever logic is needed to actually do the join.
>>
>> Though I’m sure there are more elegant approaches :)
>>
>> — Ken
>>
>>
>>
>>> On Apr 6, 2018, at 5:04 PM, Michael Latta <ml...@technomage.com> wrote:
>>>
>>> I would like to “join” several streams (>3) in a custom operator. Is this feasible in Flink?
>>>
>>>
>>> Michael
>>
>> --------------------------------------------
>> http://about.me/kkrugler
>> +1 530-210-6378
>>
>
--------------------------------------------
http://about.me/kkrugler
+1 530-210-6378
Re: Multi-stream question
Posted by TechnoMage <ml...@technomage.com>.
In my case I have more elaborate logic to select data from the streams. They are not all the same logical type, though I may be able to represent them as the same Java type. My main question is whether it is technically feasible to have a single operator that takes multiple streams as input. For example Operator(stream1, stream2, stream3) and produces an output stream. Can the checkpointing and other logic accomodate this if I write sufficient custom code in the operator?
Michael
> On Apr 7, 2018, at 10:42 AM, Ken Krugler <kk...@transpac.com> wrote:
>
> When you say “join” are you talking about a real join (so one or more fields can be used as a joining key), or some other operation?
>
> For more than two streams, you can do cascading window joins via multiple join()s that reduce your source streams down to a single stream.
>
> If the fields are the same across these streams, then a union() followed by say a ProcessFunction that implements your joining logic could work.
>
> Or you can convert all the streams to a common tuple format that consists of a unions the fields, so you can do a union() and then follow that with whatever logic is needed to actually do the join.
>
> Though I’m sure there are more elegant approaches :)
>
> — Ken
>
>
>
>> On Apr 6, 2018, at 5:04 PM, Michael Latta <ml...@technomage.com> wrote:
>>
>> I would like to “join” several streams (>3) in a custom operator. Is this feasible in Flink?
>>
>>
>> Michael
>
> --------------------------------------------
> http://about.me/kkrugler
> +1 530-210-6378
>