You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Paul Wilson <pa...@gmail.com> on 2016/06/13 14:29:15 UTC

Custom Barrier?

Hi,

I've been evaluating Flink and wondering if it was possible to define a
window that is based on characteristics of the data (data driven) but not
contained in the data stream directly.

Consider 'nested events' where lower level events belong to a wider event
where the wider event serves only to define a boundary (or window) over the
lower level events. I was wondering if there was some way to communicate
this super-structure in the stream somehow?

I know that Flink users 'barriers' to define snapshot boundaries, but it
might it be possible to communicate a 'window end' in a similar fashion?

I guess I could attach an additional value to each event using a stateful
map function and then define the window on that?

e.g. A-Start, 1, 2, 3, A-End, B-Start, 1, 2, 3, B-End

Regards,
Paul

Re: Custom Barrier?

Posted by Aljoscha Krettek <al...@apache.org>.
Hi,
when you have a parallel input stream (for example multiple kafka
partitions that you read from) would you have the super events (A-Start,
B-Start and so on) in all of the parallel streams? If the answer is yes,
then you can probably abuse the watermarks mechanism to deal with it. If
not, then I'm afraid it's impossible to track process of these super events
across parallel partitions.

Depending on your answer to the above we might be able to figure something
out together.

Cheers,
Aljoscha

On Tue, 14 Jun 2016 at 16:19 Paul Wilson <pa...@gmail.com> wrote:

> ... and those events are in order
> On 14 Jun 2016 14:04, "Paul Wilson" <pa...@gmail.com> wrote:
>
>> Hi,
>>
>> No these super-structure events only serve the purpose of defining the
>> boundaries of a join, and do not relate to the keys of the sub-events.
>>
>> Thanks,
>> Paul
>>
>> On 14 June 2016 at 10:32, Aljoscha Krettek <al...@apache.org> wrote:
>>
>>> Hi,
>>> would these super-structure events occur per key? If yes, then I think
>>> you can process this using the currently available windowing mechanism by
>>> writing a custom WindowAssigner and Trigger. This, of course, assumes that
>>> the events arrive in-order, i.e. if A-End arrives before A-Start or if
>>> elements that should fall inside the A window arrive after A-End then I
>>> don't see an easy way to do it.
>>>
>>> Let me know if you need to know more about assigners/triggers.
>>>
>>> Cheers,
>>> Aljoscha
>>>
>>> On Mon, 13 Jun 2016 at 16:29 Paul Wilson <pa...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I've been evaluating Flink and wondering if it was possible to define a
>>>> window that is based on characteristics of the data (data driven) but not
>>>> contained in the data stream directly.
>>>>
>>>> Consider 'nested events' where lower level events belong to a wider
>>>> event where the wider event serves only to define a boundary (or window)
>>>> over the lower level events. I was wondering if there was some way to
>>>> communicate this super-structure in the stream somehow?
>>>>
>>>> I know that Flink users 'barriers' to define snapshot boundaries, but
>>>> it might it be possible to communicate a 'window end' in a similar fashion?
>>>>
>>>> I guess I could attach an additional value to each event using a
>>>> stateful map function and then define the window on that?
>>>>
>>>> e.g. A-Start, 1, 2, 3, A-End, B-Start, 1, 2, 3, B-End
>>>>
>>>> Regards,
>>>> Paul
>>>>
>>>
>>

Re: Custom Barrier?

Posted by Paul Wilson <pa...@gmail.com>.
... and those events are in order
On 14 Jun 2016 14:04, "Paul Wilson" <pa...@gmail.com> wrote:

> Hi,
>
> No these super-structure events only serve the purpose of defining the
> boundaries of a join, and do not relate to the keys of the sub-events.
>
> Thanks,
> Paul
>
> On 14 June 2016 at 10:32, Aljoscha Krettek <al...@apache.org> wrote:
>
>> Hi,
>> would these super-structure events occur per key? If yes, then I think
>> you can process this using the currently available windowing mechanism by
>> writing a custom WindowAssigner and Trigger. This, of course, assumes that
>> the events arrive in-order, i.e. if A-End arrives before A-Start or if
>> elements that should fall inside the A window arrive after A-End then I
>> don't see an easy way to do it.
>>
>> Let me know if you need to know more about assigners/triggers.
>>
>> Cheers,
>> Aljoscha
>>
>> On Mon, 13 Jun 2016 at 16:29 Paul Wilson <pa...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I've been evaluating Flink and wondering if it was possible to define a
>>> window that is based on characteristics of the data (data driven) but not
>>> contained in the data stream directly.
>>>
>>> Consider 'nested events' where lower level events belong to a wider
>>> event where the wider event serves only to define a boundary (or window)
>>> over the lower level events. I was wondering if there was some way to
>>> communicate this super-structure in the stream somehow?
>>>
>>> I know that Flink users 'barriers' to define snapshot boundaries, but it
>>> might it be possible to communicate a 'window end' in a similar fashion?
>>>
>>> I guess I could attach an additional value to each event using a
>>> stateful map function and then define the window on that?
>>>
>>> e.g. A-Start, 1, 2, 3, A-End, B-Start, 1, 2, 3, B-End
>>>
>>> Regards,
>>> Paul
>>>
>>
>

Re: Custom Barrier?

Posted by Paul Wilson <pa...@gmail.com>.
Hi,

No these super-structure events only serve the purpose of defining the
boundaries of a join, and do not relate to the keys of the sub-events.

Thanks,
Paul

On 14 June 2016 at 10:32, Aljoscha Krettek <al...@apache.org> wrote:

> Hi,
> would these super-structure events occur per key? If yes, then I think you
> can process this using the currently available windowing mechanism by
> writing a custom WindowAssigner and Trigger. This, of course, assumes that
> the events arrive in-order, i.e. if A-End arrives before A-Start or if
> elements that should fall inside the A window arrive after A-End then I
> don't see an easy way to do it.
>
> Let me know if you need to know more about assigners/triggers.
>
> Cheers,
> Aljoscha
>
> On Mon, 13 Jun 2016 at 16:29 Paul Wilson <pa...@gmail.com> wrote:
>
>> Hi,
>>
>> I've been evaluating Flink and wondering if it was possible to define a
>> window that is based on characteristics of the data (data driven) but not
>> contained in the data stream directly.
>>
>> Consider 'nested events' where lower level events belong to a wider event
>> where the wider event serves only to define a boundary (or window) over the
>> lower level events. I was wondering if there was some way to communicate
>> this super-structure in the stream somehow?
>>
>> I know that Flink users 'barriers' to define snapshot boundaries, but it
>> might it be possible to communicate a 'window end' in a similar fashion?
>>
>> I guess I could attach an additional value to each event using a stateful
>> map function and then define the window on that?
>>
>> e.g. A-Start, 1, 2, 3, A-End, B-Start, 1, 2, 3, B-End
>>
>> Regards,
>> Paul
>>
>

Re: Custom Barrier?

Posted by Aljoscha Krettek <al...@apache.org>.
Hi,
would these super-structure events occur per key? If yes, then I think you
can process this using the currently available windowing mechanism by
writing a custom WindowAssigner and Trigger. This, of course, assumes that
the events arrive in-order, i.e. if A-End arrives before A-Start or if
elements that should fall inside the A window arrive after A-End then I
don't see an easy way to do it.

Let me know if you need to know more about assigners/triggers.

Cheers,
Aljoscha

On Mon, 13 Jun 2016 at 16:29 Paul Wilson <pa...@gmail.com> wrote:

> Hi,
>
> I've been evaluating Flink and wondering if it was possible to define a
> window that is based on characteristics of the data (data driven) but not
> contained in the data stream directly.
>
> Consider 'nested events' where lower level events belong to a wider event
> where the wider event serves only to define a boundary (or window) over the
> lower level events. I was wondering if there was some way to communicate
> this super-structure in the stream somehow?
>
> I know that Flink users 'barriers' to define snapshot boundaries, but it
> might it be possible to communicate a 'window end' in a similar fashion?
>
> I guess I could attach an additional value to each event using a stateful
> map function and then define the window on that?
>
> e.g. A-Start, 1, 2, 3, A-End, B-Start, 1, 2, 3, B-End
>
> Regards,
> Paul
>