You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Radu Tudoran <ra...@huawei.com> on 2016/04/21 15:44:41 UTC

lost connection

Hi,

I am trying to submit a jar via the console (flink run my.jar). The result is that I get an error saying that the communication with the jobmanager failed: Lost connection to the jobmanager.
Can you give me some hints/ recommendations about approaching this issue.

Thanks

Dr. Radu Tudoran
Research Engineer - Big Data Expert
IT R&D Division

[cid:image007.jpg@01CD52EB.AD060EE0]
HUAWEI TECHNOLOGIES Duesseldorf GmbH
European Research Center
Riesstrasse 25, 80992 München

E-mail: radu.tudoran@huawei.com
Mobile: +49 15209084330
Telephone: +49 891588344173

HUAWEI TECHNOLOGIES Duesseldorf GmbH
Hansaallee 205, 40549 Düsseldorf, Germany, www.huawei.com<http://www.huawei.com/>
Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
Managing Director: Bo PENG, Wanzhou MENG, Lifang CHEN
Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN
This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!


RE: lost connection

Posted by Radu Tudoran <ra...@huawei.com>.
Yes - it suddenly occurred on something that used to work. I am restarting the deployment to see if this solves the problem

Dr. Radu Tudoran
Research Engineer - Big Data Expert
IT R&D Division

[cid:image007.jpg@01CD52EB.AD060EE0]
HUAWEI TECHNOLOGIES Duesseldorf GmbH
European Research Center
Riesstrasse 25, 80992 München

E-mail: radu.tudoran@huawei.com
Mobile: +49 15209084330
Telephone: +49 891588344173

HUAWEI TECHNOLOGIES Duesseldorf GmbH
Hansaallee 205, 40549 Düsseldorf, Germany, www.huawei.com<http://www.huawei.com/>
Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
Managing Director: Bo PENG, Wanzhou MENG, Lifang CHEN
Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN
This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!

From: Chesnay Schepler [mailto:chesnay@apache.org]
Sent: Thursday, April 21, 2016 4:26 PM
To: user@flink.apache.org
Subject: Re: lost connection

That is an exempt from the client log, can you check the JobManager log? It could have crashed, and if so the cause is hopefully in there.

Did this issue suddenly occur; as in have you run a job successfully on the system before? (to exclude network configuration issues)

Regards,
Chesnay

On 21.04.2016 16:09, Radu Tudoran wrote:
- Could not submit job Operator2 execution (170aef70d31f3fee62f8a483930be213), because there is no connection to a JobManager.
15:59:48,456 WARN  Remoting                                                      - Tried to associate with unreachable remote address [akka.tcp://flink@10.204.62.71:6123<mailto:akka.tcp://flink@10.204.62.71:6123>]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Connection refused: /10.204.62.71:6123
16:01:28,409 ERROR org.apache.flink.client.CliFrontend                           - Error while running the command.
org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Communication with JobManager failed: Lost connection to the JobManager.

I do not understand what could be the root cause of this... the IPs look ok and there is not firewall to block things...

Dr. Radu Tudoran
Research Engineer - Big Data Expert
IT R&D Division

[cid:image007.jpg@01CD52EB.AD060EE0]
HUAWEI TECHNOLOGIES Duesseldorf GmbH
European Research Center
Riesstrasse 25, 80992 München

E-mail: radu.tudoran@huawei.com<ma...@huawei.com>
Mobile: +49 15209084330
Telephone: +49 891588344173

HUAWEI TECHNOLOGIES Duesseldorf GmbH
Hansaallee 205, 40549 Düsseldorf, Germany, www.huawei.com<http://www.huawei.com/>
Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
Managing Director: Bo PENG, Wanzhou MENG, Lifang CHEN
Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN
This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!

From: Chesnay Schepler [mailto:chesnay@apache.org]
Sent: Thursday, April 21, 2016 3:58 PM
To: user@flink.apache.org<ma...@flink.apache.org>
Subject: Re: lost connection

Hello,

the first step is always to check the logs under /log. The JobManager log in particular may contain clues as why no connection could be established.

Regards,
Chesnay

On 21.04.2016 15:44, Radu Tudoran wrote:
Hi,

I am trying to submit a jar via the console (flink run my.jar). The result is that I get an error saying that the communication with the jobmanager failed: Lost connection to the jobmanager.
Can you give me some hints/ recommendations about approaching this issue.

Thanks

Dr. Radu Tudoran
Research Engineer - Big Data Expert
IT R&D Division

[cid:image007.jpg@01CD52EB.AD060EE0]
HUAWEI TECHNOLOGIES Duesseldorf GmbH
European Research Center
Riesstrasse 25, 80992 München

E-mail: radu.tudoran@huawei.com<ma...@huawei.com>
Mobile: +49 15209084330
Telephone: +49 891588344173

HUAWEI TECHNOLOGIES Duesseldorf GmbH
Hansaallee 205, 40549 Düsseldorf, Germany, www.huawei.com<http://www.huawei.com/>
Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
Managing Director: Bo PENG, Wanzhou MENG, Lifang CHEN
Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN
This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!




Re: lost connection

Posted by Chesnay Schepler <ch...@apache.org>.
That is an exempt from the client log, can you check the JobManager log? 
It could have crashed, and if so the cause is hopefully in there.

Did this issue suddenly occur; as in have you run a job successfully on 
the system before? (to exclude network configuration issues)

Regards,
Chesnay

On 21.04.2016 16:09, Radu Tudoran wrote:
>
> - Could not submit job Operator2 execution 
> (170aef70d31f3fee62f8a483930be213), because there is no connection to 
> a JobManager.
>
> 15:59:48,456 WARN Remoting - Tried to associate with unreachable 
> remote address [akka.tcp://flink@10.204.62.71:6123]. Address is now 
> gated for 5000 ms, all messages to this address will be delivered to 
> dead letters. Reason: Connection refused: /10.204.62.71:6123
>
> 16:01:28,409 ERROR org.apache.flink.client.CliFrontend - Error while 
> running the command.
>
> org.apache.flink.client.program.ProgramInvocationException: The 
> program execution failed: Communication with JobManager failed: Lost 
> connection to the JobManager.
>
> I do not understand what could be the root cause of this… the IPs look 
> ok and there is not firewall to block things…
>
> Dr. Radu Tudoran
>
> Research Engineer - Big Data Expert
>
> IT R&D Division
>
> cid:image007.jpg@01CD52EB.AD060EE0
>
> HUAWEI TECHNOLOGIES Duesseldorf GmbH
>
> European Research Center
>
> Riesstrasse 25, 80992 München
>
> E-mail: _radu.tudoran@huawei.com_
>
> Mobile: +49 15209084330
>
> Telephone: +49 891588344173
>
> HUAWEI TECHNOLOGIES Duesseldorf GmbH
> Hansaallee 205, 40549 Düsseldorf, Germany, www.huawei.com 
> <http://www.huawei.com/>
> Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
> Managing Director: Bo PENG, Wanzhou MENG, Lifang CHEN
> Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
> Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN
>
> This e-mail and its attachments contain confidential information from 
> HUAWEI, which is intended only for the person or entity whose address 
> is listed above. Any use of the information contained herein in any 
> way (including, but not limited to, total or partial disclosure, 
> reproduction, or dissemination) by persons other than the intended 
> recipient(s) is prohibited. If you receive this e-mail in error, 
> please notify the sender by phone or email immediately and delete it!
>
> *From:*Chesnay Schepler [mailto:chesnay@apache.org]
> *Sent:* Thursday, April 21, 2016 3:58 PM
> *To:* user@flink.apache.org
> *Subject:* Re: lost connection
>
> Hello,
>
> the first step is always to check the logs under /log. The JobManager 
> log in particular may contain clues as why no connection could be 
> established.
>
> Regards,
> Chesnay
>
> On 21.04.2016 15:44, Radu Tudoran wrote:
>
>     Hi,
>
>     I am trying to submit a jar via the console (flink run my.jar).
>     The result is that I get an error saying that the communication
>     with the jobmanager failed: Lost connection to the jobmanager.
>
>     Can you give me some hints/ recommendations about approaching this
>     issue.
>
>     Thanks
>
>     Dr. Radu Tudoran
>
>     Research Engineer - Big Data Expert
>
>     IT R&D Division
>
>     cid:image007.jpg@01CD52EB.AD060EE0
>
>     HUAWEI TECHNOLOGIES Duesseldorf GmbH
>
>     European Research Center
>
>     Riesstrasse 25, 80992 München
>
>     E-mail: _radu.tudoran@huawei.com <ma...@huawei.com>_
>
>     Mobile: +49 15209084330
>
>     Telephone: +49 891588344173
>
>     HUAWEI TECHNOLOGIES Duesseldorf GmbH
>     Hansaallee 205, 40549 Düsseldorf, Germany, www.huawei.com
>     <http://www.huawei.com/>
>     Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
>     Managing Director: Bo PENG, Wanzhou MENG, Lifang CHEN
>     Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
>     Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN
>
>     This e-mail and its attachments contain confidential information
>     from HUAWEI, which is intended only for the person or entity whose
>     address is listed above. Any use of the information contained
>     herein in any way (including, but not limited to, total or partial
>     disclosure, reproduction, or dissemination) by persons other than
>     the intended recipient(s) is prohibited. If you receive this
>     e-mail in error, please notify the sender by phone or email
>     immediately and delete it!
>


RE: lost connection

Posted by Radu Tudoran <ra...@huawei.com>.
- Could not submit job Operator2 execution (170aef70d31f3fee62f8a483930be213), because there is no connection to a JobManager.
15:59:48,456 WARN  Remoting                                                      - Tried to associate with unreachable remote address [akka.tcp://flink@10.204.62.71:6123]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Connection refused: /10.204.62.71:6123
16:01:28,409 ERROR org.apache.flink.client.CliFrontend                           - Error while running the command.
org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Communication with JobManager failed: Lost connection to the JobManager.

I do not understand what could be the root cause of this... the IPs look ok and there is not firewall to block things...

Dr. Radu Tudoran
Research Engineer - Big Data Expert
IT R&D Division

[cid:image007.jpg@01CD52EB.AD060EE0]
HUAWEI TECHNOLOGIES Duesseldorf GmbH
European Research Center
Riesstrasse 25, 80992 München

E-mail: radu.tudoran@huawei.com
Mobile: +49 15209084330
Telephone: +49 891588344173

HUAWEI TECHNOLOGIES Duesseldorf GmbH
Hansaallee 205, 40549 Düsseldorf, Germany, www.huawei.com<http://www.huawei.com/>
Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
Managing Director: Bo PENG, Wanzhou MENG, Lifang CHEN
Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN
This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!

From: Chesnay Schepler [mailto:chesnay@apache.org]
Sent: Thursday, April 21, 2016 3:58 PM
To: user@flink.apache.org
Subject: Re: lost connection

Hello,

the first step is always to check the logs under /log. The JobManager log in particular may contain clues as why no connection could be established.

Regards,
Chesnay

On 21.04.2016 15:44, Radu Tudoran wrote:
Hi,

I am trying to submit a jar via the console (flink run my.jar). The result is that I get an error saying that the communication with the jobmanager failed: Lost connection to the jobmanager.
Can you give me some hints/ recommendations about approaching this issue.

Thanks

Dr. Radu Tudoran
Research Engineer - Big Data Expert
IT R&D Division

[cid:image007.jpg@01CD52EB.AD060EE0]
HUAWEI TECHNOLOGIES Duesseldorf GmbH
European Research Center
Riesstrasse 25, 80992 München

E-mail: radu.tudoran@huawei.com<ma...@huawei.com>
Mobile: +49 15209084330
Telephone: +49 891588344173

HUAWEI TECHNOLOGIES Duesseldorf GmbH
Hansaallee 205, 40549 Düsseldorf, Germany, www.huawei.com<http://www.huawei.com/>
Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
Managing Director: Bo PENG, Wanzhou MENG, Lifang CHEN
Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN
This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!



Re: lost connection

Posted by Chesnay Schepler <ch...@apache.org>.
Hello,

the first step is always to check the logs under /log. The JobManager 
log in particular may contain clues as why no connection could be 
established.

Regards,
Chesnay

On 21.04.2016 15:44, Radu Tudoran wrote:
>
> Hi,
>
> I am trying to submit a jar via the console (flink run my.jar). The 
> result is that I get an error saying that the communication with the 
> jobmanager failed: Lost connection to the jobmanager.
>
> Can you give me some hints/ recommendations about approaching this issue.
>
> Thanks
>
> Dr. Radu Tudoran
>
> Research Engineer - Big Data Expert
>
> IT R&D Division
>
> cid:image007.jpg@01CD52EB.AD060EE0
>
> HUAWEI TECHNOLOGIES Duesseldorf GmbH
>
> European Research Center
>
> Riesstrasse 25, 80992 München
>
> E-mail: _radu.tudoran@huawei.com_
>
> Mobile: +49 15209084330
>
> Telephone: +49 891588344173
>
> HUAWEI TECHNOLOGIES Duesseldorf GmbH
> Hansaallee 205, 40549 Düsseldorf, Germany, www.huawei.com 
> <http://www.huawei.com/>
> Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
> Managing Director: Bo PENG, Wanzhou MENG, Lifang CHEN
> Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
> Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN
>
> This e-mail and its attachments contain confidential information from 
> HUAWEI, which is intended only for the person or entity whose address 
> is listed above. Any use of the information contained herein in any 
> way (including, but not limited to, total or partial disclosure, 
> reproduction, or dissemination) by persons other than the intended 
> recipient(s) is prohibited. If you receive this e-mail in error, 
> please notify the sender by phone or email immediately and delete it!
>