You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by rmetzger <gi...@git.apache.org> on 2016/06/07 16:23:54 UTC

[GitHub] flink pull request #2080: [FLINK-3530] Fix Kafka08 instability: Avoid restar...

GitHub user rmetzger opened a pull request:

    https://github.com/apache/flink/pull/2080

    [FLINK-3530] Fix Kafka08 instability: Avoid restarts from SuccessExce\u2026

    This pull request is improving the Kafka tests stability.
    
    These tests have one artificial test failure, the code should recover from, then, they throw a SuccessException once a stopping condition has been met.
    With the number of restarts set to 3, the job was restarting two times due to the SuccessException's. Sometimes, the task cancellation takes a lot of time, letting the test exceed the timeout of 60 seconds.
    
    This pull request sets the number of restarts to one: There will be one artificial test failure, then a success exception, finishing the tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rmetzger/flink flink3530

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/2080.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2080
    
----
commit f7033a9e4426a8351a2039bd91db377d8bdc76b0
Author: Robert Metzger <rm...@apache.org>
Date:   2016-06-07T16:20:17Z

    [FLINK-3530] Fix Kafka08 instability: Avoid restarts from SuccessException

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2080: [FLINK-3530] Fix Kafka08 instability: Avoid restarts from...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the issue:

    https://github.com/apache/flink/pull/2080
  
    Do you know where the canceling looses the time?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2080: [FLINK-3530] Fix Kafka08 instability: Avoid restar...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/2080


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2080: [FLINK-3530] Fix Kafka08 instability: Avoid restarts from...

Posted by rmetzger <gi...@git.apache.org>.
Github user rmetzger commented on the issue:

    https://github.com/apache/flink/pull/2080
  
    These are the relevant logs of the task:
    ```
    20:11:35,397 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to cancel task Source: Custom Source -> Map -> Map (5/8)
    20:11:35,398 INFO  org.apache.flink.runtime.taskmanager.Task                     - Source: Custom Source -> Map -> Map (5/8) switched to CANCELING
    20:11:35,398 INFO  org.apache.flink.runtime.taskmanager.Task                     - Triggering cancellation of task code Source: Custom Source -> Map -> Map (5/8) (217f8fe570f1c82eb4ec8191e1a73291).
    20:12:05,400 WARN  org.apache.flink.runtime.taskmanager.Task                     - Task 'Source: Custom Source -> Map -> Map (5/8)' did not react to cancelling signal, but is stuck in method:
     org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:235)
    org.apache.flink.runtime.taskmanager.Task.run(Task.java:587)
    java.lang.Thread.run(Thread.java:745)
    
    20:12:05,510 INFO  org.apache.flink.runtime.taskmanager.Task                     - Source: Custom Source -> Map -> Map (5/8) switched to CANCELED
    ```
    
    And this is the code where its waiting: https://github.com/apache/flink/blob/e7586c3b2d995be164100919d7c04db003a71a90/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java#L235
    
    I don't exactly know why the line numbers don't really match (I would expect the code to block at the synchronized block) . I've also checked the lines with the exact commit the error was triggered.
    
    I was not able to reproduce this issue locally. I suspect that somebody is not releasing the lock...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---