You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Liam Clarke <li...@adscale.co.nz> on 2020/04/19 08:08:05 UTC

Unexpected behaviour on windowing aggregations

Hello all,

I have been running this code against production data, and I'm emitting
counts/sums for a sentinel record id to stdout so I can observe the
behaviour:

https://gist.github.com/LiamClarkeNZ/b101ce6a42a2e5e1efddfe3a98c5805f

When this code is run, the window duration is 2 minutes, grace period is 20
seconds, and retention time is 20 minutes.

I am endeavouring to use event time as the timestamp basis for this process:
https://gist.github.com/LiamClarkeNZ/8265cec02e21f5969e0fedb8281a2180

So, my sentinel debugging output shows a surprising behaviour in that the
outbound counts for the key always sum higher than the inbound count. For
example:

Sample: 2020-04-19T07:31:37.492Z

Inbound
{
    2020-04-19T03:00:00Z=4563,
    2020-04-19T04:00:00Z=5629,
    2020-04-19T05:00:00Z=8489,
    2020-04-19T06:00:00Z=13599
}

Outbound
{
    2020-04-19T03:00:00Z=4717,
    2020-04-19T04:00:00Z=5890,
    2020-04-19T05:00:00Z=8826,
    2020-04-19T06:00:00Z=13951
}

This makes me suspect that either I'm not using the window I thought I was
(e.g., I'm somehow using a sliding window instead of a tumbling window) or
that I have made a rookie error somewhere in my aggregations, or I've just
misunderstood something about this. Does it matter that the window size in
the persistent window store doesn't match the windowing time + grace time
in the windowing clause?

Any pointers gratefully welcome.

Kind regards,

Liam Clarke-Hutchinson

Re: Unexpected behaviour on windowing aggregations

Posted by Liam Clarke-Hutchinson <li...@adscale.co.nz>.
Yep, it sure was, now that I've removed the PEBKAC, numbers match up
perfectly :)

(A somewhat sheepish) thank you,

Liam

On Mon, 20 Apr. 2020, 2:51 pm John Roesler, <vv...@apache.org> wrote:

> Hey Liam,
>
> Hah! Tell me about it...
>
> Well, let’s hope that was it.
> Thanks,
> John
>
> On Sun, Apr 19, 2020, at 18:43, Liam Clarke wrote:
> > Hi John,
> >
> > Thanks for the reply - yep, that was a dumb copy and paste error, which
> is
> > what I get for coding while surrounded by kids. >_< I'm deploying a fixed
> > version of it as we speak. Thanks for the reply though :)
> >
> > Kind regards,
> >
> > Liam Clarke
> >
> >
> >
> > On Mon, 20 Apr. 2020, 2:08 am John Roesler, <vv...@apache.org> wrote:
> >
> > > Hi Liam,
> > >
> > > I took a quick look. On the output side, it looks like you’re adding
> the
> > > count to the prior count. Should that just set the outbound vale to
> the new
> > > count? Maybe I misunderstood the situation.
> > >
> > > What I mean is, suppose you get two events for the same window:
> > >
> > > Inbound map := 0+1 = 1
> > > Count = 1
> > > Outbound map := 0+1 = 1
> > > (Proposed outbound := 1)
> > >
> > > Then,
> > >
> > > Inbound map := 1+1 = 2
> > > Count = 2
> > > Outbound map := 1+2 = 3
> > > (Proposed outbound := 2)
> > >
> > > Does that make sense?
> > > -John
> > >
> > > On Sun, Apr 19, 2020, at 03:08, Liam Clarke wrote:
> > > > Hello all,
> > > >
> > > > I have been running this code against production data, and I'm
> emitting
> > > > counts/sums for a sentinel record id to stdout so I can observe the
> > > > behaviour:
> > > >
> > > >
> https://gist.github.com/LiamClarkeNZ/b101ce6a42a2e5e1efddfe3a98c5805f
> > > >
> > > > When this code is run, the window duration is 2 minutes, grace
> period is
> > > 20
> > > > seconds, and retention time is 20 minutes.
> > > >
> > > > I am endeavouring to use event time as the timestamp basis for this
> > > process:
> > > >
> https://gist.github.com/LiamClarkeNZ/8265cec02e21f5969e0fedb8281a2180
> > > >
> > > > So, my sentinel debugging output shows a surprising behaviour in
> that the
> > > > outbound counts for the key always sum higher than the inbound
> count. For
> > > > example:
> > > >
> > > > Sample: 2020-04-19T07:31:37.492Z
> > > >
> > > > Inbound
> > > > {
> > > >     2020-04-19T03:00:00Z=4563,
> > > >     2020-04-19T04:00:00Z=5629,
> > > >     2020-04-19T05:00:00Z=8489,
> > > >     2020-04-19T06:00:00Z=13599
> > > > }
> > > >
> > > > Outbound
> > > > {
> > > >     2020-04-19T03:00:00Z=4717,
> > > >     2020-04-19T04:00:00Z=5890,
> > > >     2020-04-19T05:00:00Z=8826,
> > > >     2020-04-19T06:00:00Z=13951
> > > > }
> > > >
> > > > This makes me suspect that either I'm not using the window I thought
> I
> > > was
> > > > (e.g., I'm somehow using a sliding window instead of a tumbling
> window)
> > > or
> > > > that I have made a rookie error somewhere in my aggregations, or I've
> > > just
> > > > misunderstood something about this. Does it matter that the window
> size
> > > in
> > > > the persistent window store doesn't match the windowing time + grace
> time
> > > > in the windowing clause?
> > > >
> > > > Any pointers gratefully welcome.
> > > >
> > > > Kind regards,
> > > >
> > > > Liam Clarke-Hutchinson
> > > >
> > >
> >
>

Re: Unexpected behaviour on windowing aggregations

Posted by John Roesler <vv...@apache.org>.
Hey Liam,

Hah! Tell me about it...

Well, let’s hope that was it. 
Thanks,
John

On Sun, Apr 19, 2020, at 18:43, Liam Clarke wrote:
> Hi John,
> 
> Thanks for the reply - yep, that was a dumb copy and paste error, which is
> what I get for coding while surrounded by kids. >_< I'm deploying a fixed
> version of it as we speak. Thanks for the reply though :)
> 
> Kind regards,
> 
> Liam Clarke
> 
> 
> 
> On Mon, 20 Apr. 2020, 2:08 am John Roesler, <vv...@apache.org> wrote:
> 
> > Hi Liam,
> >
> > I took a quick look. On the output side, it looks like you’re adding the
> > count to the prior count. Should that just set the outbound vale to the new
> > count? Maybe I misunderstood the situation.
> >
> > What I mean is, suppose you get two events for the same window:
> >
> > Inbound map := 0+1 = 1
> > Count = 1
> > Outbound map := 0+1 = 1
> > (Proposed outbound := 1)
> >
> > Then,
> >
> > Inbound map := 1+1 = 2
> > Count = 2
> > Outbound map := 1+2 = 3
> > (Proposed outbound := 2)
> >
> > Does that make sense?
> > -John
> >
> > On Sun, Apr 19, 2020, at 03:08, Liam Clarke wrote:
> > > Hello all,
> > >
> > > I have been running this code against production data, and I'm emitting
> > > counts/sums for a sentinel record id to stdout so I can observe the
> > > behaviour:
> > >
> > > https://gist.github.com/LiamClarkeNZ/b101ce6a42a2e5e1efddfe3a98c5805f
> > >
> > > When this code is run, the window duration is 2 minutes, grace period is
> > 20
> > > seconds, and retention time is 20 minutes.
> > >
> > > I am endeavouring to use event time as the timestamp basis for this
> > process:
> > > https://gist.github.com/LiamClarkeNZ/8265cec02e21f5969e0fedb8281a2180
> > >
> > > So, my sentinel debugging output shows a surprising behaviour in that the
> > > outbound counts for the key always sum higher than the inbound count. For
> > > example:
> > >
> > > Sample: 2020-04-19T07:31:37.492Z
> > >
> > > Inbound
> > > {
> > >     2020-04-19T03:00:00Z=4563,
> > >     2020-04-19T04:00:00Z=5629,
> > >     2020-04-19T05:00:00Z=8489,
> > >     2020-04-19T06:00:00Z=13599
> > > }
> > >
> > > Outbound
> > > {
> > >     2020-04-19T03:00:00Z=4717,
> > >     2020-04-19T04:00:00Z=5890,
> > >     2020-04-19T05:00:00Z=8826,
> > >     2020-04-19T06:00:00Z=13951
> > > }
> > >
> > > This makes me suspect that either I'm not using the window I thought I
> > was
> > > (e.g., I'm somehow using a sliding window instead of a tumbling window)
> > or
> > > that I have made a rookie error somewhere in my aggregations, or I've
> > just
> > > misunderstood something about this. Does it matter that the window size
> > in
> > > the persistent window store doesn't match the windowing time + grace time
> > > in the windowing clause?
> > >
> > > Any pointers gratefully welcome.
> > >
> > > Kind regards,
> > >
> > > Liam Clarke-Hutchinson
> > >
> >
>

Re: Unexpected behaviour on windowing aggregations

Posted by Liam Clarke <li...@adscale.co.nz>.
Hi John,

Thanks for the reply - yep, that was a dumb copy and paste error, which is
what I get for coding while surrounded by kids. >_< I'm deploying a fixed
version of it as we speak. Thanks for the reply though :)

Kind regards,

Liam Clarke



On Mon, 20 Apr. 2020, 2:08 am John Roesler, <vv...@apache.org> wrote:

> Hi Liam,
>
> I took a quick look. On the output side, it looks like you’re adding the
> count to the prior count. Should that just set the outbound vale to the new
> count? Maybe I misunderstood the situation.
>
> What I mean is, suppose you get two events for the same window:
>
> Inbound map := 0+1 = 1
> Count = 1
> Outbound map := 0+1 = 1
> (Proposed outbound := 1)
>
> Then,
>
> Inbound map := 1+1 = 2
> Count = 2
> Outbound map := 1+2 = 3
> (Proposed outbound := 2)
>
> Does that make sense?
> -John
>
> On Sun, Apr 19, 2020, at 03:08, Liam Clarke wrote:
> > Hello all,
> >
> > I have been running this code against production data, and I'm emitting
> > counts/sums for a sentinel record id to stdout so I can observe the
> > behaviour:
> >
> > https://gist.github.com/LiamClarkeNZ/b101ce6a42a2e5e1efddfe3a98c5805f
> >
> > When this code is run, the window duration is 2 minutes, grace period is
> 20
> > seconds, and retention time is 20 minutes.
> >
> > I am endeavouring to use event time as the timestamp basis for this
> process:
> > https://gist.github.com/LiamClarkeNZ/8265cec02e21f5969e0fedb8281a2180
> >
> > So, my sentinel debugging output shows a surprising behaviour in that the
> > outbound counts for the key always sum higher than the inbound count. For
> > example:
> >
> > Sample: 2020-04-19T07:31:37.492Z
> >
> > Inbound
> > {
> >     2020-04-19T03:00:00Z=4563,
> >     2020-04-19T04:00:00Z=5629,
> >     2020-04-19T05:00:00Z=8489,
> >     2020-04-19T06:00:00Z=13599
> > }
> >
> > Outbound
> > {
> >     2020-04-19T03:00:00Z=4717,
> >     2020-04-19T04:00:00Z=5890,
> >     2020-04-19T05:00:00Z=8826,
> >     2020-04-19T06:00:00Z=13951
> > }
> >
> > This makes me suspect that either I'm not using the window I thought I
> was
> > (e.g., I'm somehow using a sliding window instead of a tumbling window)
> or
> > that I have made a rookie error somewhere in my aggregations, or I've
> just
> > misunderstood something about this. Does it matter that the window size
> in
> > the persistent window store doesn't match the windowing time + grace time
> > in the windowing clause?
> >
> > Any pointers gratefully welcome.
> >
> > Kind regards,
> >
> > Liam Clarke-Hutchinson
> >
>

Re: Unexpected behaviour on windowing aggregations

Posted by John Roesler <vv...@apache.org>.
Hi Liam,

I took a quick look. On the output side, it looks like you’re adding the count to the prior count. Should that just set the outbound vale to the new count? Maybe I misunderstood the situation. 

What I mean is, suppose you get two events for the same window:

Inbound map := 0+1 = 1
Count = 1
Outbound map := 0+1 = 1
(Proposed outbound := 1)

Then,

Inbound map := 1+1 = 2
Count = 2
Outbound map := 1+2 = 3
(Proposed outbound := 2)

Does that make sense?
-John

On Sun, Apr 19, 2020, at 03:08, Liam Clarke wrote:
> Hello all,
> 
> I have been running this code against production data, and I'm emitting
> counts/sums for a sentinel record id to stdout so I can observe the
> behaviour:
> 
> https://gist.github.com/LiamClarkeNZ/b101ce6a42a2e5e1efddfe3a98c5805f
> 
> When this code is run, the window duration is 2 minutes, grace period is 20
> seconds, and retention time is 20 minutes.
> 
> I am endeavouring to use event time as the timestamp basis for this process:
> https://gist.github.com/LiamClarkeNZ/8265cec02e21f5969e0fedb8281a2180
> 
> So, my sentinel debugging output shows a surprising behaviour in that the
> outbound counts for the key always sum higher than the inbound count. For
> example:
> 
> Sample: 2020-04-19T07:31:37.492Z
> 
> Inbound
> {
>     2020-04-19T03:00:00Z=4563,
>     2020-04-19T04:00:00Z=5629,
>     2020-04-19T05:00:00Z=8489,
>     2020-04-19T06:00:00Z=13599
> }
> 
> Outbound
> {
>     2020-04-19T03:00:00Z=4717,
>     2020-04-19T04:00:00Z=5890,
>     2020-04-19T05:00:00Z=8826,
>     2020-04-19T06:00:00Z=13951
> }
> 
> This makes me suspect that either I'm not using the window I thought I was
> (e.g., I'm somehow using a sliding window instead of a tumbling window) or
> that I have made a rookie error somewhere in my aggregations, or I've just
> misunderstood something about this. Does it matter that the window size in
> the persistent window store doesn't match the windowing time + grace time
> in the windowing clause?
> 
> Any pointers gratefully welcome.
> 
> Kind regards,
> 
> Liam Clarke-Hutchinson
>