You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by pnowojski <gi...@git.apache.org> on 2017/10/04 16:22:50 UTC
[GitHub] flink pull request #4775: [FLINK-7739] Fix KafkaXXITCase tests stability
GitHub user pnowojski opened a pull request:
https://github.com/apache/flink/pull/4775
[FLINK-7739] Fix KafkaXXITCase tests stability
## What is the purpose of the change
This change fixes Kafka*ITCase tests stability. Main fix is excluding `netty` dependency from zookeeper. Other two are probably just cosmetic changes.
For more info please look into individual commit messages.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/pnowojski/flink kafka-test2
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/4775.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #4775
----
commit c7cc24d062aa233d86b68b7438c9a4e717003393
Author: Piotr Nowojski <pi...@gmail.com>
Date: 2017-09-29T16:23:29Z
[FLINK-7739][kafka-tests] Set shorter heartbeats intervals
Default pause value of 60seconds is too large (tests would timeout before akka react)
commit 1677791f10153b9f7ecd552eac148d6ae3d056f1
Author: Piotr Nowojski <pi...@gmail.com>
Date: 2017-10-04T11:48:11Z
[FLINK-7739][kafka-tests] Set restart delay to non zero
Give TaskManagers some time to clean up before restaring a job.
commit 937c3fb388d9d7104b6336f59c3674bb70bfbf50
Author: Piotr Nowojski <pi...@gmail.com>
Date: 2017-10-04T14:50:57Z
[FLINK-7739] Exclude netty dependency from zookeeper
Zookeeper was pulling in conflicting Netty version. Conflict was
extremly subtle - TaskManager in kafka tests was deadlocking in some
rare corner cases.
----
---
[GitHub] flink pull request #4775: [FLINK-7739] Fix KafkaXXITCase tests stability
Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on a diff in the pull request:
https://github.com/apache/flink/pull/4775#discussion_r143254179
--- Diff: flink-connectors/flink-connector-kafka-base/src/test/java/org/apache/flink/streaming/connectors/kafka/KafkaTestBase.java ---
@@ -121,10 +122,12 @@ public static void shutDownServices() throws Exception {
protected static Configuration getFlinkConfiguration() {
Configuration flinkConfig = new Configuration();
+ flinkConfig.setString(AkkaOptions.WATCH_HEARTBEAT_PAUSE, "5 s");
+ flinkConfig.setString(AkkaOptions.WATCH_HEARTBEAT_INTERVAL, "1 s");
flinkConfig.setInteger(ConfigConstants.LOCAL_NUMBER_TASK_MANAGER, NUM_TMS);
flinkConfig.setInteger(ConfigConstants.TASK_MANAGER_NUM_TASK_SLOTS, TM_SLOTS);
flinkConfig.setLong(TaskManagerOptions.MANAGED_MEMORY_SIZE, 16L);
- flinkConfig.setString(ConfigConstants.RESTART_STRATEGY_FIXED_DELAY_DELAY, "0 s");
+ flinkConfig.setString(ConfigConstants.RESTART_STRATEGY_FIXED_DELAY_DELAY, "5 s");
--- End diff --
If we can avoid this, we will save time during testing....
---
[GitHub] flink pull request #4775: [FLINK-7739] Fix KafkaXXITCase tests stability
Posted by pnowojski <gi...@git.apache.org>.
Github user pnowojski commented on a diff in the pull request:
https://github.com/apache/flink/pull/4775#discussion_r143162433
--- Diff: flink-connectors/flink-connector-kafka-base/src/test/java/org/apache/flink/streaming/connectors/kafka/KafkaTestBase.java ---
@@ -121,10 +122,13 @@ public static void shutDownServices() throws Exception {
protected static Configuration getFlinkConfiguration() {
Configuration flinkConfig = new Configuration();
+ flinkConfig.setString(AkkaOptions.WATCH_HEARTBEAT_PAUSE, "5 s");
+ flinkConfig.setString(AkkaOptions.WATCH_HEARTBEAT_INTERVAL, "1 s");
+ flinkConfig.setBoolean(AkkaOptions.LOG_LIFECYCLE_EVENTS, true);
--- End diff --
Yes sure, I forgot to drop it, it was only for debug purposes
---
[GitHub] flink pull request #4775: [FLINK-7739] Fix KafkaXXITCase tests stability
Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on a diff in the pull request:
https://github.com/apache/flink/pull/4775#discussion_r143149399
--- Diff: flink-connectors/flink-connector-kafka-base/src/test/java/org/apache/flink/streaming/connectors/kafka/KafkaTestBase.java ---
@@ -121,10 +122,13 @@ public static void shutDownServices() throws Exception {
protected static Configuration getFlinkConfiguration() {
Configuration flinkConfig = new Configuration();
+ flinkConfig.setString(AkkaOptions.WATCH_HEARTBEAT_PAUSE, "5 s");
+ flinkConfig.setString(AkkaOptions.WATCH_HEARTBEAT_INTERVAL, "1 s");
+ flinkConfig.setBoolean(AkkaOptions.LOG_LIFECYCLE_EVENTS, true);
--- End diff --
Can we omit this log?
---
[GitHub] flink issue #4775: [FLINK-7739] Fix KafkaXXITCase tests stability
Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the issue:
https://github.com/apache/flink/pull/4775
Thanks, will merge this without the added restart delay. If it is still unstable, we can add that back.
---
[GitHub] flink issue #4775: [FLINK-7739] Fix KafkaXXITCase tests stability
Posted by pnowojski <gi...@git.apache.org>.
Github user pnowojski commented on the issue:
https://github.com/apache/flink/pull/4775
@StephanEwen thanks for merging. We can do that.
---
[GitHub] flink issue #4775: [FLINK-7739] Fix KafkaXXITCase tests stability
Posted by pnowojski <gi...@git.apache.org>.
Github user pnowojski commented on the issue:
https://github.com/apache/flink/pull/4775
I have run ~500 Kafka09 tests on travis and problem with `TaskManager` was lost/no more resources is gone. However in those 500 runs twice I have seen `at-least-once` test failure ( @tzulitai is looking into it )
---
[GitHub] flink pull request #4775: [FLINK-7739] Fix KafkaXXITCase tests stability
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/flink/pull/4775
---