You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by miki haiat <mi...@gmail.com> on 2018/05/29 07:56:11 UTC

HA stand alone cluster error

i had some catastrofic eroror

>
>  ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -
> Fatal error occurred in the cluster entrypoint.
> org.apache.flink.util.FlinkException: Failed to recover job
> a048ad572c9837a400eca20cd55241b6.
> File does not exist:
> /flink_1.5/ha/beam1/blob/job_a048ad572c9837a400eca20cd55241b6/blob_p-45d544ca331844235e4f09e2a738b4de38a3bb0a-5dc3a8cbc69f56d9c824a7a4fddc131d



I was unable to start the cluster again ,
I  removed all the data from Hdoop and clean Zookeeper  in order to be able
to start the cluster again.

But now i have this error

2018-05-29 03:51:54,082 ERROR
> org.apache.flink.runtime.dispatcher.StandaloneDispatcher      - Could not
> recover job graph for job e3369e6dce5305b9411b4695975eea26.
> org.apache.flink.util.FlinkException: Could not retrieve submitted
> JobGraph from state handle under /e3369e6dce5305b9411b4695975eea26. This
> indicates that the retrieved state handle is broken. Try cleaning the state
> handle store.


how can i clean the state and bring back the cluster ...

Thanks,

Miki

Re: HA stand alone cluster error

Posted by Gary Yao <ga...@data-artisans.com>.
Hi Miki,

Sorry for the late reply. If you are able to reproduce the first problem, it
would be good to see the complete JobManager logs.

The second exception indicates that you have not removed all data from
ZooKeeper. On recovery, Flink looks up the locations of the submitted
JobGraphs
in ZooKeeper. You can check for yourself which jobs will be recovered by
checking the contents of znode /flink/<namespace>/jobgraphs.

Best,
Gary

On Tue, May 29, 2018 at 9:56 AM, miki haiat <mi...@gmail.com> wrote:

> i had some catastrofic eroror
>
>>
>>  ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -
>> Fatal error occurred in the cluster entrypoint.
>> org.apache.flink.util.FlinkException: Failed to recover job
>> a048ad572c9837a400eca20cd55241b6.
>> File does not exist: /flink_1.5/ha/beam1/blob/job_
>> a048ad572c9837a400eca20cd55241b6/blob_p-45d544ca331844235e4f09e2a738b4
>> de38a3bb0a-5dc3a8cbc69f56d9c824a7a4fddc131d
>
>
>
> I was unable to start the cluster again ,
> I  removed all the data from Hdoop and clean Zookeeper  in order to be
> able to start the cluster again.
>
> But now i have this error
>
> 2018-05-29 03:51:54,082 ERROR org.apache.flink.runtime.dispatcher.StandaloneDispatcher
>>     - Could not recover job graph for job e3369e6dce5305b9411b4695975eea
>> 26.
>> org.apache.flink.util.FlinkException: Could not retrieve submitted
>> JobGraph from state handle under /e3369e6dce5305b9411b4695975eea26. This
>> indicates that the retrieved state handle is broken. Try cleaning the state
>> handle store.
>
>
> how can i clean the state and bring back the cluster ...
>
> Thanks,
>
> Miki
>
>
>