You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Bowen Li (JIRA)" <ji...@apache.org> on 2017/08/09 00:15:02 UTC

[jira] [Commented] (FLINK-7223) Increase DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS for Flink-kinesis-connector

    [ https://issues.apache.org/jira/browse/FLINK-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16119222#comment-16119222 ] 

Bowen Li commented on FLINK-7223:
---------------------------------

I'm actually fine with the default value, since we are already aware of this issue, and we override the default value ourself. But from a enterprise user's point of view, the assumption of 'this means running up to 50 Flink jobs per account' is not practical at all for a big AWS enterprise customer. Here's why:

In an enterprise AWS account, lots of other services are contributing to saturating the 10requests/sec throughput. In our (OfferUp, inc) prod AWS account, we are hitting that cap even before having Flink. Having more and more Flink jobs makes things worse, and breaks other services. Our Flink jobs are not more important than other services, so we can't either allocate resources solely for Flink or compete with other services. Thus we set Flink's discovery interval to be {{1 hour}}. 

It's probably not a topic worths a long discussion :) If you guys feel the default value is ok, I'll good with it

> Increase DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS for Flink-kinesis-connector
> ----------------------------------------------------------------------------
>
>                 Key: FLINK-7223
>                 URL: https://issues.apache.org/jira/browse/FLINK-7223
>             Project: Flink
>          Issue Type: Improvement
>          Components: Kinesis Connector
>    Affects Versions: 1.3.0
>            Reporter: Bowen Li
>            Assignee: Bowen Li
>            Priority: Minor
>             Fix For: 1.4.0
>
>
> Background: {{DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS}} in {{org.apache.flink.streaming.connectors.kinesis.config.ConsumerConfigConstants}} is the default value for Flink to call Kinesis's {{describeStream()}} API.
> Problem: Right now, its value is 10,000millis (10sec), which is too short. We ran into problems that Flink-kinesis-connector's call of {{describeStream()}} exceeds Kinesis rate limit, and broken Flink taskmanager.
> According to http://docs.aws.amazon.com/kinesis/latest/APIReference/API_DescribeStream.html, 
> "This operation has a limit of 10 transactions per second per account.". What it means is that the 10transaction/account is a limit on a single organization's AWS account......:(  We contacted AWS Support, and confirmed this. If you have more applications (either other Flink apps or non-Flink apps) competing aggressively with your Flink app on this API, your Flink app breaks. 
> I propose increasing the value DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS from 10,000millis(10sec) to preferably 300,000 (5min). Or at least 60,000 (1min) if anyone has a solid reason arguing that 5min is too long, 
> This is also related to https://issues.apache.org/jira/browse/FLINK-6365



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)