You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Arvid Heise <ar...@ververica.com> on 2020/11/30 20:33:34 UTC

Re: Anomalous spikes in aggregations of keyed data

Hi Mark,

could you double check if these spikes co-occur with checkpointing? If
there is an alignment, certain channels are blocked from taking in data. If
all keys are more or less contained in a shard with less data, it would why
only these keys are affected.

On Mon, Nov 30, 2020 at 9:27 PM Kegel, Mark <Ma...@disneystreaming.com>
wrote:

> We have a high volume (600-700 shards) kinesis data stream that we are
> doing a simple keying and aggregation on. The logic is very simple: kinesis
> source, key by fields (A,B,C), window (1-minute, tumbling), aggregate by
> summing over integer field R, connect to sink.
>
>
>
> We are seeing some anomalous spikes in our aggregations. From one minute
> to the next, the sum total for one particular key may increase 25x or more
> and then drop back down to a normal level, yet sums for other keys in the
> same window remain roughly the same, which we expect.
>
>
>
> We don’t see this too often. Maybe 1-5 data points (key + timestamp) in an
> hour’s worth of 1-minute windowed data will have these spikes. The data has
> fairly low cardinality. There are only roughly two hundred distinct keys.
>
>
>
> We inspected the raw kinesis stream and found no duplicates. It isn’t
> clear how these spikes could happen or what we might do to work around the
> issue since the code is as idiomatic as possible.
>
>
>
> We are running the job as part of Kinesis Data Analytics, which is using
> Flink version 1.8. To connect to Kinesis we are using the
> amazon-kinesis-connection-flink library (v1.0.4) library and the EFO
> consumer mode.
>
>
>


-- 

Arvid Heise | Senior Java Developer

<https://www.ververica.com/>

Follow us @VervericaData

--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Toni) Cheng

Re: Anomalous spikes in aggregations of keyed data

Posted by "Kegel, Mark" <Ma...@disneystreaming.com>.
At the moment we checkpoint every minute. I can turn this frequency down but I’m not sure that will fix/hide the issue.

Mark

From: Arvid Heise <ar...@ververica.com>
Date: Monday, November 30, 2020 at 2:33 PM
To: Kegel, Mark <Ma...@disneystreaming.com>
Cc: user@flink.apache.org <us...@flink.apache.org>
Subject: Re: Anomalous spikes in aggregations of keyed data
Hi Mark,

could you double check if these spikes co-occur with checkpointing? If there is an alignment, certain channels are blocked from taking in data. If all keys are more or less contained in a shard with less data, it would why only these keys are affected.

On Mon, Nov 30, 2020 at 9:27 PM Kegel, Mark <Ma...@disneystreaming.com>> wrote:
We have a high volume (600-700 shards) kinesis data stream that we are doing a simple keying and aggregation on. The logic is very simple: kinesis source, key by fields (A,B,C), window (1-minute, tumbling), aggregate by summing over integer field R, connect to sink.

We are seeing some anomalous spikes in our aggregations. From one minute to the next, the sum total for one particular key may increase 25x or more and then drop back down to a normal level, yet sums for other keys in the same window remain roughly the same, which we expect.

We don’t see this too often. Maybe 1-5 data points (key + timestamp) in an hour’s worth of 1-minute windowed data will have these spikes. The data has fairly low cardinality. There are only roughly two hundred distinct keys.

We inspected the raw kinesis stream and found no duplicates. It isn’t clear how these spikes could happen or what we might do to work around the issue since the code is as idiomatic as possible.

We are running the job as part of Kinesis Data Analytics, which is using Flink version 1.8. To connect to Kinesis we are using the amazon-kinesis-connection-flink library (v1.0.4) library and the EFO consumer mode.



--

Arvid Heise | Senior Java Developer

[https://lh5.googleusercontent.com/ODbO0aq1IqKMfuoy_pw2YH8r6dqDRTq37rg3ytg11FCGJx12jJ1ff_SANPBxTHzSJTUQY9JLuoXq4NB7Om7j6Vq1lg6jIOKz8S5g2VKDGwicbj5fbY09PVb6mD5TdRuWEUvEMZTG]<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ververica.com%2F&data=04%7C01%7CMark.Kegel%40disneystreaming.com%7C123c5008070c449a738908d8956f3c4d%7C65f03ca86d0a493e9e4ac85ac9526a03%7C1%7C0%7C637423652270107199%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=gduDFr1FnDlxFuNR0Y2IuDe3PdB%2FzMDqbGeqyCQ8PfQ%3D&reserved=0>


Follow us @VervericaData

--

Join Flink Forward<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fflink-forward.org%2F&data=04%7C01%7CMark.Kegel%40disneystreaming.com%7C123c5008070c449a738908d8956f3c4d%7C65f03ca86d0a493e9e4ac85ac9526a03%7C1%7C0%7C637423652270117197%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=sK2pfROr0iYsdxoH%2FzEoPXyc3tzN%2BA1jgZhUdRA60f4%3D&reserved=0> - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng