You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Aljoscha Krettek (Jira)" <ji...@apache.org> on 2020/07/23 09:03:00 UTC

[jira] [Closed] (FLINK-18398) ElasticSearch unavailibility causes TM shutdown

     [ https://issues.apache.org/jira/browse/FLINK-18398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aljoscha Krettek closed FLINK-18398.
------------------------------------
    Resolution: Won't Fix

I'm closing this for now since it seems to be caused by https://github.com/elastic/elasticsearch/issues/47599 which we cannot fix from Flink.

> ElasticSearch unavailibility causes TM shutdown
> -----------------------------------------------
>
>                 Key: FLINK-18398
>                 URL: https://issues.apache.org/jira/browse/FLINK-18398
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / ElasticSearch
>    Affects Versions: 1.10.0
>            Reporter: Alexander Fedulov
>            Priority: Critical
>         Attachments: elastic_jm_log.txt, elastic_tm_log.txt
>
>
> Similarly to [FLINK-17327|https://issues.apache.org/jira/browse/FLINK-17327], unavailibility of ElasticSearch cluster causes Tasks cancellation to timeout and Task Manager to be killed. The following exceptions can be found in the logs:
>  
> {code:java}
> 2020-06-15 19:52:03.664Z ERROR [  I/O dispatcher 229] .f.s.c.e.ElasticsearchSinkBase : Failed Elasticsearch bulk request: request retries exceeded max retry timeout [30000]java.io.IOException: request retries exceeded max retry timeout [30000]
> ...
> 2020-06-15 19:55:03.861Z  WARN [43df85ee0f907ae9d0).] o.a.f.r.taskmanager.Task       : Task 'graph53 (1/1)' did not react to cancelling signal for 30 seconds, but is stuck in method:
>  org.elasticsearch.action.bulk.BulkProcessor.flush(BulkProcessor.java:356)
> ...
> 2020-06-15 19:55:04.120Z ERROR [663038f87ef09c4da6).] o.a.f.r.taskmanager.Task       : Task did not exit gracefully within 180 + seconds.
> 2020-06-15 19:55:04.121Z ERROR [663038f87ef09c4da6).] o.a.f.r.t.TaskExecutor         : Task did not exit gracefully within 180 + seconds.
> 2020-06-15 19:55:04.121Z ERROR [663038f87ef09c4da6).] o.a.f.r.t.TaskManagerRunner    : Fatal error occurred while executing the TaskManager. Shutting it down...
> {code}
> Detailed logs  are attached.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)