You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Hao Sun <ha...@zendesk.com> on 2018/03/16 03:38:30 UTC

Can not cancel with savepoint with Flink 1.3.2

Hi, I am running flink on K8S and store states in s3 with rocksdb backend.

I used to be able to cancel and savepointing through the rest api.
But sometimes the process never finish. No matter how many time I try.

Is there a way to figure out what is going wrong?
Why "isStoppable"=>false?

Thanks

==============================
[cancel_with_savepoint] progress: {"status"=>"in-progress",
"request-id"=>1}, job_id: 1392811585ca8cda779511008bce3046
==============================
[cancel_with_savepoint] job_status:
{"jid"=>"1392811585ca8cda779511008bce3046", "name"=>"KafkaDemo
maxwell.accounts (env:staging)", "isStoppable"=>false, "state"=>"RUNNING",
"start-time"=>1521169404370, "end-time"=>-1, "duration"=>1559274,
"now"=>1521170963644, "timestamps"=>{"CREATED"=>1521169404370,
"RUNNING"=>1521169404506, "FAILING"=>0, "FAILED"=>0, "CANCELLING"=>0,
"CANCELED"=>0, "FINISHED"=>0, "RESTARTING"=>1521168804370, "SUSPENDED"=>0,
"RECONCILING"=>0}, "vertices"=>[{"id"=>"2f4bc854a18755730e14a90e1d4d7c7d",
"name"=>"Source: KafkaSource(maxwell.accounts) ->
MaxwellFilter->Maxwell(maxwell.accounts) ->
FixedDelayWatermark(maxwell.accounts) ->
MaxwellFPSEvent->InfluxDBData(maxwell.accounts) -> Sink:
influxdbSink(maxwell.accounts)", "parallelism"=>1, "status"=>"RUNNING",
"start-time"=>1521169404506, "end-time"=>-1, "duration"=>1559138,
"tasks"=>{"CREATED"=>0, "SCHEDULED"=>0, "DEPLOYING"=>0, "RUNNING"=>1,
"FINISHED"=>0, "CANCELING"=>0, "CANCELED"=>0, "FAILED"=>0,
"RECONCILING"=>0}, "metrics"=>{"read-bytes"=>0, "write-bytes"=>0,
"read-records"=>0, "write-records"=>0}}], "status-counts"=>{"CREATED"=>0,
"SCHEDULED"=>0, "DEPLOYING"=>0, "RUNNING"=>1, "FINISHED"=>0,
"CANCELING"=>0, "CANCELED"=>0, "FAILED"=>0, "RECONCILING"=>0},
"plan"=>{"jid"=>"1392811585ca8cda779511008bce3046", "name"=>"KafkaDemo
maxwell.accounts (env:staging)",
"nodes"=>[{"id"=>"2f4bc854a18755730e14a90e1d4d7c7d", "parallelism"=>1,
"operator"=>"", "operator_strategy"=>"", "description"=>"Source:
KafkaSource(maxwell.accounts) -&gt;
MaxwellFilter-&gt;Maxwell(maxwell.accounts) -&gt;
FixedDelayWatermark(maxwell.accounts) -&gt;
MaxwellFPSEvent-&gt;InfluxDBData(maxwell.accounts) -&gt; Sink:
influxdbSink(maxwell.accounts)", "optimizer_properties"=>{}}]}}, job_id:
1392811585ca8cda779511008bce3046

Re: Can not cancel with savepoint with Flink 1.3.2

Posted by Hao Sun <ha...@zendesk.com>.
Is this related?

2018-03-16 03:43:42,557 INFO  akka.actor.EmptyLocalActorRef
                 - Message
[org.apache.flink.runtime.metrics.dump.MetricDumpSerialization$MetricSerializationResult]
from Actor[akka.tcp://flink@fps-flink-taskmanager-120318156-9sw8l:43048/user/MetricQueryService_dff29d22e5adee13e761c957283d30ce#1096154549]
to Actor[akka://flink/temp/$mc] was not delivered. [156] dead letters
encountered. This logging can be turned off or adjusted with configuration
settings 'akka.log-dead-letters' and
'akka.log-dead-letters-during-shutdown'.
2018-03-16 03:43:45,460 INFO  akka.actor.EmptyLocalActorRef
                 - Message
[org.apache.flink.runtime.metrics.dump.MetricDumpSerialization$MetricSerializationResult]
from Actor[akka.tcp://flink@fps-flink-taskmanager-120318156-2vvkr:36163/user/MetricQueryService_eb07cde4f1affbbe48023c1a40516c1c#1874500433]
to Actor[akka://flink/temp/$nc] was not delivered. [157] dead letters
encountered. This logging can be turned off or adjusted with configuration
settings 'akka.log-dead-letters' and
'akka.log-dead-letters-during-shutdown'.
2018-03-16 03:43:50,412 INFO  akka.actor.EmptyLocalActorRef
                 - Message
[org.apache.flink.runtime.metrics.dump.MetricDumpSerialization$MetricSerializationResult]
from Actor[akka.tcp://flink@fps-flink-taskmanager-120318156-2vvkr:36163/user/MetricQueryService_eb07cde4f1affbbe48023c1a40516c1c#1874500433]
to Actor[akka://flink/temp/$pc] was not delivered. [158] dead letters
encountered. This logging can be turned off or adjusted with configuration
settings 'akka.log-dead-letters' and
'akka.log-dead-letters-during-shutdown'.
2018-03-16 03:44:05,430 INFO  akka.actor.EmptyLocalActorRef
                 - Message
[org.apache.flink.runtime.metrics.dump.MetricDumpSerialization$MetricSerializationResult]
from Actor[akka.tcp://flink@fps-flink-taskmanager-120318156-2vvkr:36163/user/MetricQueryService_eb07cde4f1affbbe48023c1a40516c1c#1874500433]
to Actor[akka://flink/temp/$sc] was not delivered. [159] dead letters
encountered. This logging can be turned off or adjusted with configuration
settings 'akka.log-dead-letters' and
'akka.log-dead-letters-during-shutdown'.

On Thu, Mar 15, 2018 at 8:38 PM Hao Sun <ha...@zendesk.com> wrote:

> Hi, I am running flink on K8S and store states in s3 with rocksdb backend.
>
> I used to be able to cancel and savepointing through the rest api.
> But sometimes the process never finish. No matter how many time I try.
>
> Is there a way to figure out what is going wrong?
> Why "isStoppable"=>false?
>
> Thanks
>
> ==============================
> [cancel_with_savepoint] progress: {"status"=>"in-progress",
> "request-id"=>1}, job_id: 1392811585ca8cda779511008bce3046
> ==============================
> [cancel_with_savepoint] job_status:
> {"jid"=>"1392811585ca8cda779511008bce3046", "name"=>"KafkaDemo
> maxwell.accounts (env:staging)", "isStoppable"=>false, "state"=>"RUNNING",
> "start-time"=>1521169404370, "end-time"=>-1, "duration"=>1559274,
> "now"=>1521170963644, "timestamps"=>{"CREATED"=>1521169404370,
> "RUNNING"=>1521169404506, "FAILING"=>0, "FAILED"=>0, "CANCELLING"=>0,
> "CANCELED"=>0, "FINISHED"=>0, "RESTARTING"=>1521168804370, "SUSPENDED"=>0,
> "RECONCILING"=>0}, "vertices"=>[{"id"=>"2f4bc854a18755730e14a90e1d4d7c7d",
> "name"=>"Source: KafkaSource(maxwell.accounts) ->
> MaxwellFilter->Maxwell(maxwell.accounts) ->
> FixedDelayWatermark(maxwell.accounts) ->
> MaxwellFPSEvent->InfluxDBData(maxwell.accounts) -> Sink:
> influxdbSink(maxwell.accounts)", "parallelism"=>1, "status"=>"RUNNING",
> "start-time"=>1521169404506, "end-time"=>-1, "duration"=>1559138,
> "tasks"=>{"CREATED"=>0, "SCHEDULED"=>0, "DEPLOYING"=>0, "RUNNING"=>1,
> "FINISHED"=>0, "CANCELING"=>0, "CANCELED"=>0, "FAILED"=>0,
> "RECONCILING"=>0}, "metrics"=>{"read-bytes"=>0, "write-bytes"=>0,
> "read-records"=>0, "write-records"=>0}}], "status-counts"=>{"CREATED"=>0,
> "SCHEDULED"=>0, "DEPLOYING"=>0, "RUNNING"=>1, "FINISHED"=>0,
> "CANCELING"=>0, "CANCELED"=>0, "FAILED"=>0, "RECONCILING"=>0},
> "plan"=>{"jid"=>"1392811585ca8cda779511008bce3046", "name"=>"KafkaDemo
> maxwell.accounts (env:staging)",
> "nodes"=>[{"id"=>"2f4bc854a18755730e14a90e1d4d7c7d", "parallelism"=>1,
> "operator"=>"", "operator_strategy"=>"", "description"=>"Source:
> KafkaSource(maxwell.accounts) -&gt;
> MaxwellFilter-&gt;Maxwell(maxwell.accounts) -&gt;
> FixedDelayWatermark(maxwell.accounts) -&gt;
> MaxwellFPSEvent-&gt;InfluxDBData(maxwell.accounts) -&gt; Sink:
> influxdbSink(maxwell.accounts)", "optimizer_properties"=>{}}]}}, job_id:
> 1392811585ca8cda779511008bce3046
>