You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Vijay Balakrishnan <bv...@gmail.com> on 2019/05/17 15:46:54 UTC

FlinkKinesisConsumer not getting data from Kinesis at a constant speed -lag of about 30-55 secs

Hi,
In using FlinkKinesisConsumer, I am seeing a lag of about 30-55 secs in
fetching data from Kinesis after it has done 1 or 2 fetches even though
data is getting put in the Kinesis data stream at a high clip.
I used ConsumerConfigConstants.SHARD_GETRECORDS_MAX of 10000 (tried with
5000, 200 etc) and ConsumerConfigConstants.SHARD_GETRECORDS_INTERVAL_MILLIS
of 200ms(default is great here becaise of the 5 transaction limit per sec
from AWS).Have also tried reducing the interval but I run into
readThroughput Exception. How can I reduce this lag to make it pretty much
real-time. I am also using Flink Processing time. Have gone from 1-3 shards
for Kinesis Data stream. Is there some other tuning parm I need to add for
FlinkKinesisConsumer or is it just that it doesn't have any data to pull
from Kinesis.
I do 5 sec Tumbling time windows and use the window end timestamp to put
into my InfluxDB timestamp column. I see that there is a constant 35 sec-
55 sec lag in the timestamps and that corresponds to the time lag I see in
the logs where FlinkKinesisConsumer is waiting to fetch data from Kinesis.
I am seeing these log statements and not sure what to make of it to reduce
the time lag of fetching data from Kinesis.
Logs:

23:23:40,286 [shardConsumers-Source: Custom Source -> (Map -> Sink:
Unnamed, Filter) (2/8)-thread-0] DEBUG
org.apache.flink.kinesis.shaded.com.amazonaws.requestId      [] -
x-amzn-RequestId: f06409aa-d996-fb3f-a53c-5c066d509c9b
23:23:40,335 [Source: Custom Source -> (Map -> Sink: Unnamed, Filter)
(2/8)] DEBUG
org.apache.flink.streaming.connectors.kinesis.internals.KinesisDataFetcher
[] - Subtask 1 is trying to discover new shards that were created due to
resharding ...


TIA,