You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by miki haiat <mi...@gmail.com> on 2018/05/31 16:14:28 UTC

File does not exist prevent from Job manager to start .

Hi,

Im having some wierd issue with the JM recovery ,
I using HDFS and ZOOKEEPER for HA stand alone cluster .

Iv  stop the cluster change some parameters in the flink conf (Memory).
But now when i start the cluster again im having an error that preventing
from JM to start.
somehow the checkpoint file doesn't exists in HDOOP  and JM wont start .

full log JM log file
<https://gist.github.com/miko-code/28d57b32cb9c4f1aa96fa9873e10e53c>


> 2018-05-31 11:57:05,568 ERROR
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Fatal error
> occurred in the cluster entrypoint.

Caused by: java.lang.Exception: Cannot set up the user code libraries: File
does not exist:
/flink1.5/ha/default/blob/job_5c545fc3f43d69325fb9966b8dd4c8f3/blob_p-5d9f3be555d3b05f90b5e148235d25730eb65b3d-ae486e221962f7b96e36da18fe1c57ca
at
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:72)

Re: File does not exist prevent from Job manager to start .

Posted by Till Rohrmann <tr...@apache.org>.

Hi Miki,

Flink tries first to store the checkpoint data in Hadoop before writing the
handle to the meta data in ZooKeeper. Thus, if the handle is in ZooKeeper,
then it should also have been written to HDFS. Maybe you could check the
HDFS logs whether you find something suspicious.

If ZooKeeper fails while writing the meta data state handle, then the
checkpoint should be automatically discarded. But you might want to
investigate why the ZooKeeper authentication failed. Flink needs a working
ZooKeeper quorum to run in HA mode.

Maybe you could try to reproduce a failing run and share the log files with
us. They might be helpful to further investigate the problem.

Cheers,
Till

On Wed, Jun 6, 2018 at 1:06 PM miki haiat <mi...@gmail.com> wrote:

> I had some   zookeeper errors that  crashed the cluster
>
>  ERROR org.apache.flink.shaded.org.apache.curator.ConnectionState
>   - Authentication failed
>
> What happen to Flink checkpoint and state if zookeeper cluster is crashed
> ?
> Is it possible that the checkpoint/state is written in zookeeper   but
> not in Hadoop and then when i try to restart the flink cluster im getting
> the file not find error ??
>
>
> On Mon, Jun 4, 2018 at 4:27 PM Till Rohrmann <tr...@apache.org> wrote:
>
>> Hi Miki,
>>
>> it looks as if you did not submit a job to the cluster of which you
>> shared the logs. At least I could not see a submit job call.
>>
>> Cheers,
>> Till
>>
>> On Mon, Jun 4, 2018 at 12:31 PM miki haiat <mi...@gmail.com> wrote:
>>
>>> HI Till,
>>> Iv`e managed to do  reproduce it.
>>> Full log faild_jm.log
>>> <https://gist.githubusercontent.com/miko-code/e634164404354c4c590be84292fd8cb2/raw/baeee310cd50cfa79303b328e3334d960c8e98e6/faild_jm.log>
>>>
>>>
>>>
>>>
>>> On Mon, Jun 4, 2018 at 10:33 AM Till Rohrmann <tr...@apache.org>
>>> wrote:
>>>
>>>> Hmmm, Flink should not delete the stored blobs on the HA storage. Could
>>>> you try to reproduce the problem and then send us the logs on DEBUG level?
>>>> Please also check before shutting the cluster down, that the files were
>>>> there.
>>>>
>>>> Cheers,
>>>> Till
>>>>
>>>> On Sun, Jun 3, 2018 at 1:10 PM miki haiat <mi...@gmail.com> wrote:
>>>>
>>>>> Hi  Till ,
>>>>>
>>>>>    1. the files are not longer exist in HDFS.
>>>>>    2. yes , stop and start the cluster from the bin commands.
>>>>>    3.  unfortunately i deleted the log.. :(
>>>>>
>>>>>
>>>>> I wondered if this code could cause this issue , the way in using
>>>>> checkpoint
>>>>>
>>>>> StateBackend sb = new FsStateBackend("hdfs://***/flink/my_city/checkpoints");
>>>>> env.setStateBackend(sb);
>>>>> env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.AT_LEAST_ONCE);
>>>>> env.getCheckpointConfig().setCheckpointInterval(60000);
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jun 1, 2018 at 6:19 PM Till Rohrmann <tr...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Hi Miki,
>>>>>>
>>>>>> could you check whether the files are really no longer stored on
>>>>>> HDFS? How did you terminate the cluster? Simply calling
>>>>>> `bin/stop-cluster.sh`? I just tried it locally and it could recover the job
>>>>>> after calling `bin/start-cluster.sh` again.
>>>>>>
>>>>>> What would be helpful are the logs from the initial run of the job.
>>>>>> So if you can reproduce the problem, then this log would be very helpful.
>>>>>>
>>>>>> Cheers,
>>>>>> Till
>>>>>>
>>>>>> On Thu, May 31, 2018 at 6:14 PM, miki haiat <mi...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Im having some wierd issue with the JM recovery ,
>>>>>>> I using HDFS and ZOOKEEPER for HA stand alone cluster .
>>>>>>>
>>>>>>> Iv  stop the cluster change some parameters in the flink conf
>>>>>>> (Memory).
>>>>>>> But now when i start the cluster again im having an error that
>>>>>>> preventing from JM to start.
>>>>>>> somehow the checkpoint file doesn't exists in HDOOP  and JM wont
>>>>>>> start .
>>>>>>>
>>>>>>> full log JM log file
>>>>>>> <https://gist.github.com/miko-code/28d57b32cb9c4f1aa96fa9873e10e53c>
>>>>>>>
>>>>>>>
>>>>>>>> 2018-05-31 11:57:05,568 ERROR
>>>>>>>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Fatal error
>>>>>>>> occurred in the cluster entrypoint.
>>>>>>>
>>>>>>> Caused by: java.lang.Exception: Cannot set up the user code
>>>>>>> libraries: File does not exist:
>>>>>>> /flink1.5/ha/default/blob/job_5c545fc3f43d69325fb9966b8dd4c8f3/blob_p-5d9f3be555d3b05f90b5e148235d25730eb65b3d-ae486e221962f7b96e36da18fe1c57ca
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:72)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>

Re: File does not exist prevent from Job manager to start .

Posted by miki haiat <mi...@gmail.com>.

I had some   zookeeper errors that  crashed the cluster

 ERROR org.apache.flink.shaded.org.apache.curator.ConnectionState
  - Authentication failed

What happen to Flink checkpoint and state if zookeeper cluster is crashed  ?
Is it possible that the checkpoint/state is written in zookeeper   but not
in Hadoop and then when i try to restart the flink cluster im getting the
file not find error ??


On Mon, Jun 4, 2018 at 4:27 PM Till Rohrmann <tr...@apache.org> wrote:

> Hi Miki,
>
> it looks as if you did not submit a job to the cluster of which you shared
> the logs. At least I could not see a submit job call.
>
> Cheers,
> Till
>
> On Mon, Jun 4, 2018 at 12:31 PM miki haiat <mi...@gmail.com> wrote:
>
>> HI Till,
>> Iv`e managed to do  reproduce it.
>> Full log faild_jm.log
>> <https://gist.githubusercontent.com/miko-code/e634164404354c4c590be84292fd8cb2/raw/baeee310cd50cfa79303b328e3334d960c8e98e6/faild_jm.log>
>>
>>
>>
>>
>> On Mon, Jun 4, 2018 at 10:33 AM Till Rohrmann <tr...@apache.org>
>> wrote:
>>
>>> Hmmm, Flink should not delete the stored blobs on the HA storage. Could
>>> you try to reproduce the problem and then send us the logs on DEBUG level?
>>> Please also check before shutting the cluster down, that the files were
>>> there.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Sun, Jun 3, 2018 at 1:10 PM miki haiat <mi...@gmail.com> wrote:
>>>
>>>> Hi  Till ,
>>>>
>>>>    1. the files are not longer exist in HDFS.
>>>>    2. yes , stop and start the cluster from the bin commands.
>>>>    3.  unfortunately i deleted the log.. :(
>>>>
>>>>
>>>> I wondered if this code could cause this issue , the way in using
>>>> checkpoint
>>>>
>>>> StateBackend sb = new FsStateBackend("hdfs://***/flink/my_city/checkpoints");
>>>> env.setStateBackend(sb);
>>>> env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.AT_LEAST_ONCE);
>>>> env.getCheckpointConfig().setCheckpointInterval(60000);
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Jun 1, 2018 at 6:19 PM Till Rohrmann <tr...@apache.org>
>>>> wrote:
>>>>
>>>>> Hi Miki,
>>>>>
>>>>> could you check whether the files are really no longer stored on HDFS?
>>>>> How did you terminate the cluster? Simply calling `bin/stop-cluster.sh`? I
>>>>> just tried it locally and it could recover the job after calling
>>>>> `bin/start-cluster.sh` again.
>>>>>
>>>>> What would be helpful are the logs from the initial run of the job. So
>>>>> if you can reproduce the problem, then this log would be very helpful.
>>>>>
>>>>> Cheers,
>>>>> Till
>>>>>
>>>>> On Thu, May 31, 2018 at 6:14 PM, miki haiat <mi...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Im having some wierd issue with the JM recovery ,
>>>>>> I using HDFS and ZOOKEEPER for HA stand alone cluster .
>>>>>>
>>>>>> Iv  stop the cluster change some parameters in the flink conf
>>>>>> (Memory).
>>>>>> But now when i start the cluster again im having an error that
>>>>>> preventing from JM to start.
>>>>>> somehow the checkpoint file doesn't exists in HDOOP  and JM wont
>>>>>> start .
>>>>>>
>>>>>> full log JM log file
>>>>>> <https://gist.github.com/miko-code/28d57b32cb9c4f1aa96fa9873e10e53c>
>>>>>>
>>>>>>
>>>>>>> 2018-05-31 11:57:05,568 ERROR
>>>>>>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Fatal error
>>>>>>> occurred in the cluster entrypoint.
>>>>>>
>>>>>> Caused by: java.lang.Exception: Cannot set up the user code
>>>>>> libraries: File does not exist:
>>>>>> /flink1.5/ha/default/blob/job_5c545fc3f43d69325fb9966b8dd4c8f3/blob_p-5d9f3be555d3b05f90b5e148235d25730eb65b3d-ae486e221962f7b96e36da18fe1c57ca
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:72)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>

Re: File does not exist prevent from Job manager to start .

Posted by Till Rohrmann <tr...@apache.org>.

Hi Miki,

it looks as if you did not submit a job to the cluster of which you shared
the logs. At least I could not see a submit job call.

Cheers,
Till

On Mon, Jun 4, 2018 at 12:31 PM miki haiat <mi...@gmail.com> wrote:

> HI Till,
> Iv`e managed to do  reproduce it.
> Full log faild_jm.log
> <https://gist.githubusercontent.com/miko-code/e634164404354c4c590be84292fd8cb2/raw/baeee310cd50cfa79303b328e3334d960c8e98e6/faild_jm.log>
>
>
>
>
> On Mon, Jun 4, 2018 at 10:33 AM Till Rohrmann <tr...@apache.org>
> wrote:
>
>> Hmmm, Flink should not delete the stored blobs on the HA storage. Could
>> you try to reproduce the problem and then send us the logs on DEBUG level?
>> Please also check before shutting the cluster down, that the files were
>> there.
>>
>> Cheers,
>> Till
>>
>> On Sun, Jun 3, 2018 at 1:10 PM miki haiat <mi...@gmail.com> wrote:
>>
>>> Hi  Till ,
>>>
>>>    1. the files are not longer exist in HDFS.
>>>    2. yes , stop and start the cluster from the bin commands.
>>>    3.  unfortunately i deleted the log.. :(
>>>
>>>
>>> I wondered if this code could cause this issue , the way in using
>>> checkpoint
>>>
>>> StateBackend sb = new FsStateBackend("hdfs://***/flink/my_city/checkpoints");
>>> env.setStateBackend(sb);
>>> env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.AT_LEAST_ONCE);
>>> env.getCheckpointConfig().setCheckpointInterval(60000);
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Jun 1, 2018 at 6:19 PM Till Rohrmann <tr...@apache.org>
>>> wrote:
>>>
>>>> Hi Miki,
>>>>
>>>> could you check whether the files are really no longer stored on HDFS?
>>>> How did you terminate the cluster? Simply calling `bin/stop-cluster.sh`? I
>>>> just tried it locally and it could recover the job after calling
>>>> `bin/start-cluster.sh` again.
>>>>
>>>> What would be helpful are the logs from the initial run of the job. So
>>>> if you can reproduce the problem, then this log would be very helpful.
>>>>
>>>> Cheers,
>>>> Till
>>>>
>>>> On Thu, May 31, 2018 at 6:14 PM, miki haiat <mi...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Im having some wierd issue with the JM recovery ,
>>>>> I using HDFS and ZOOKEEPER for HA stand alone cluster .
>>>>>
>>>>> Iv  stop the cluster change some parameters in the flink conf (Memory).
>>>>> But now when i start the cluster again im having an error that
>>>>> preventing from JM to start.
>>>>> somehow the checkpoint file doesn't exists in HDOOP  and JM wont start
>>>>> .
>>>>>
>>>>> full log JM log file
>>>>> <https://gist.github.com/miko-code/28d57b32cb9c4f1aa96fa9873e10e53c>
>>>>>
>>>>>
>>>>>> 2018-05-31 11:57:05,568 ERROR
>>>>>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Fatal error
>>>>>> occurred in the cluster entrypoint.
>>>>>
>>>>> Caused by: java.lang.Exception: Cannot set up the user code libraries:
>>>>> File does not exist:
>>>>> /flink1.5/ha/default/blob/job_5c545fc3f43d69325fb9966b8dd4c8f3/blob_p-5d9f3be555d3b05f90b5e148235d25730eb65b3d-ae486e221962f7b96e36da18fe1c57ca
>>>>> at
>>>>> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:72)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>

Re: File does not exist prevent from Job manager to start .

Posted by miki haiat <mi...@gmail.com>.

HI Till,
Iv`e managed to do  reproduce it.
Full log faild_jm.log
<https://gist.githubusercontent.com/miko-code/e634164404354c4c590be84292fd8cb2/raw/baeee310cd50cfa79303b328e3334d960c8e98e6/faild_jm.log>




On Mon, Jun 4, 2018 at 10:33 AM Till Rohrmann <tr...@apache.org> wrote:

> Hmmm, Flink should not delete the stored blobs on the HA storage. Could
> you try to reproduce the problem and then send us the logs on DEBUG level?
> Please also check before shutting the cluster down, that the files were
> there.
>
> Cheers,
> Till
>
> On Sun, Jun 3, 2018 at 1:10 PM miki haiat <mi...@gmail.com> wrote:
>
>> Hi  Till ,
>>
>>    1. the files are not longer exist in HDFS.
>>    2. yes , stop and start the cluster from the bin commands.
>>    3.  unfortunately i deleted the log.. :(
>>
>>
>> I wondered if this code could cause this issue , the way in using
>> checkpoint
>>
>> StateBackend sb = new FsStateBackend("hdfs://***/flink/my_city/checkpoints");
>> env.setStateBackend(sb);
>> env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.AT_LEAST_ONCE);
>> env.getCheckpointConfig().setCheckpointInterval(60000);
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Fri, Jun 1, 2018 at 6:19 PM Till Rohrmann <tr...@apache.org>
>> wrote:
>>
>>> Hi Miki,
>>>
>>> could you check whether the files are really no longer stored on HDFS?
>>> How did you terminate the cluster? Simply calling `bin/stop-cluster.sh`? I
>>> just tried it locally and it could recover the job after calling
>>> `bin/start-cluster.sh` again.
>>>
>>> What would be helpful are the logs from the initial run of the job. So
>>> if you can reproduce the problem, then this log would be very helpful.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Thu, May 31, 2018 at 6:14 PM, miki haiat <mi...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Im having some wierd issue with the JM recovery ,
>>>> I using HDFS and ZOOKEEPER for HA stand alone cluster .
>>>>
>>>> Iv  stop the cluster change some parameters in the flink conf (Memory).
>>>> But now when i start the cluster again im having an error that
>>>> preventing from JM to start.
>>>> somehow the checkpoint file doesn't exists in HDOOP  and JM wont start .
>>>>
>>>> full log JM log file
>>>> <https://gist.github.com/miko-code/28d57b32cb9c4f1aa96fa9873e10e53c>
>>>>
>>>>
>>>>> 2018-05-31 11:57:05,568 ERROR
>>>>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Fatal error
>>>>> occurred in the cluster entrypoint.
>>>>
>>>> Caused by: java.lang.Exception: Cannot set up the user code libraries:
>>>> File does not exist:
>>>> /flink1.5/ha/default/blob/job_5c545fc3f43d69325fb9966b8dd4c8f3/blob_p-5d9f3be555d3b05f90b5e148235d25730eb65b3d-ae486e221962f7b96e36da18fe1c57ca
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:72)
>>>>
>>>>
>>>>
>>>>
>>>

Re: File does not exist prevent from Job manager to start .

Posted by Till Rohrmann <tr...@apache.org>.

Hmmm, Flink should not delete the stored blobs on the HA storage. Could you
try to reproduce the problem and then send us the logs on DEBUG level?
Please also check before shutting the cluster down, that the files were
there.

Cheers,
Till

On Sun, Jun 3, 2018 at 1:10 PM miki haiat <mi...@gmail.com> wrote:

> Hi  Till ,
>
>    1. the files are not longer exist in HDFS.
>    2. yes , stop and start the cluster from the bin commands.
>    3.  unfortunately i deleted the log.. :(
>
>
> I wondered if this code could cause this issue , the way in using
> checkpoint
>
> StateBackend sb = new FsStateBackend("hdfs://***/flink/my_city/checkpoints");
> env.setStateBackend(sb);
> env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.AT_LEAST_ONCE);
> env.getCheckpointConfig().setCheckpointInterval(60000);
>
>
>
>
>
>
>
>
>
>
> On Fri, Jun 1, 2018 at 6:19 PM Till Rohrmann <tr...@apache.org> wrote:
>
>> Hi Miki,
>>
>> could you check whether the files are really no longer stored on HDFS?
>> How did you terminate the cluster? Simply calling `bin/stop-cluster.sh`? I
>> just tried it locally and it could recover the job after calling
>> `bin/start-cluster.sh` again.
>>
>> What would be helpful are the logs from the initial run of the job. So if
>> you can reproduce the problem, then this log would be very helpful.
>>
>> Cheers,
>> Till
>>
>> On Thu, May 31, 2018 at 6:14 PM, miki haiat <mi...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Im having some wierd issue with the JM recovery ,
>>> I using HDFS and ZOOKEEPER for HA stand alone cluster .
>>>
>>> Iv  stop the cluster change some parameters in the flink conf (Memory).
>>> But now when i start the cluster again im having an error that
>>> preventing from JM to start.
>>> somehow the checkpoint file doesn't exists in HDOOP  and JM wont start .
>>>
>>> full log JM log file
>>> <https://gist.github.com/miko-code/28d57b32cb9c4f1aa96fa9873e10e53c>
>>>
>>>
>>>> 2018-05-31 11:57:05,568 ERROR
>>>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Fatal error
>>>> occurred in the cluster entrypoint.
>>>
>>> Caused by: java.lang.Exception: Cannot set up the user code libraries:
>>> File does not exist:
>>> /flink1.5/ha/default/blob/job_5c545fc3f43d69325fb9966b8dd4c8f3/blob_p-5d9f3be555d3b05f90b5e148235d25730eb65b3d-ae486e221962f7b96e36da18fe1c57ca
>>> at
>>> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:72)
>>>
>>>
>>>
>>>
>>

Re: File does not exist prevent from Job manager to start .

Posted by miki haiat <mi...@gmail.com>.

Hi  Till ,

   1. the files are not longer exist in HDFS.
   2. yes , stop and start the cluster from the bin commands.
   3.  unfortunately i deleted the log.. :(


I wondered if this code could cause this issue , the way in using
checkpoint

StateBackend sb = new FsStateBackend("hdfs://***/flink/my_city/checkpoints");
env.setStateBackend(sb);
env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.AT_LEAST_ONCE);
env.getCheckpointConfig().setCheckpointInterval(60000);










On Fri, Jun 1, 2018 at 6:19 PM Till Rohrmann <tr...@apache.org> wrote:

> Hi Miki,
>
> could you check whether the files are really no longer stored on HDFS? How
> did you terminate the cluster? Simply calling `bin/stop-cluster.sh`? I just
> tried it locally and it could recover the job after calling
> `bin/start-cluster.sh` again.
>
> What would be helpful are the logs from the initial run of the job. So if
> you can reproduce the problem, then this log would be very helpful.
>
> Cheers,
> Till
>
> On Thu, May 31, 2018 at 6:14 PM, miki haiat <mi...@gmail.com> wrote:
>
>> Hi,
>>
>> Im having some wierd issue with the JM recovery ,
>> I using HDFS and ZOOKEEPER for HA stand alone cluster .
>>
>> Iv  stop the cluster change some parameters in the flink conf (Memory).
>> But now when i start the cluster again im having an error that preventing
>> from JM to start.
>> somehow the checkpoint file doesn't exists in HDOOP  and JM wont start .
>>
>> full log JM log file
>> <https://gist.github.com/miko-code/28d57b32cb9c4f1aa96fa9873e10e53c>
>>
>>
>>> 2018-05-31 11:57:05,568 ERROR
>>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Fatal error
>>> occurred in the cluster entrypoint.
>>
>> Caused by: java.lang.Exception: Cannot set up the user code libraries:
>> File does not exist:
>> /flink1.5/ha/default/blob/job_5c545fc3f43d69325fb9966b8dd4c8f3/blob_p-5d9f3be555d3b05f90b5e148235d25730eb65b3d-ae486e221962f7b96e36da18fe1c57ca
>> at
>> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:72)
>>
>>
>>
>>
>

Re: File does not exist prevent from Job manager to start .

Posted by Till Rohrmann <tr...@apache.org>.

Hi Miki,

could you check whether the files are really no longer stored on HDFS? How
did you terminate the cluster? Simply calling `bin/stop-cluster.sh`? I just
tried it locally and it could recover the job after calling
`bin/start-cluster.sh` again.

What would be helpful are the logs from the initial run of the job. So if
you can reproduce the problem, then this log would be very helpful.

Cheers,
Till

On Thu, May 31, 2018 at 6:14 PM, miki haiat <mi...@gmail.com> wrote:

> Hi,
>
> Im having some wierd issue with the JM recovery ,
> I using HDFS and ZOOKEEPER for HA stand alone cluster .
>
> Iv  stop the cluster change some parameters in the flink conf (Memory).
> But now when i start the cluster again im having an error that preventing
> from JM to start.
> somehow the checkpoint file doesn't exists in HDOOP  and JM wont start .
>
> full log JM log file
> <https://gist.github.com/miko-code/28d57b32cb9c4f1aa96fa9873e10e53c>
>
>
>> 2018-05-31 11:57:05,568 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint
>> - Fatal error occurred in the cluster entrypoint.
>
> Caused by: java.lang.Exception: Cannot set up the user code libraries:
> File does not exist: /flink1.5/ha/default/blob/job_
> 5c545fc3f43d69325fb9966b8dd4c8f3/blob_p-5d9f3be555d3b05f90b5e148235d25
> 730eb65b3d-ae486e221962f7b96e36da18fe1c57ca
> at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(
> INodeFile.java:72)
>
>
>
>