You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by GitBox <gi...@apache.org> on 2020/09/22 13:57:24 UTC

[GitHub] [incubator-dolphinscheduler] nightxing opened a new issue #3789: [Bug][remote] channel time out

nightxing opened a new issue #3789:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3789


   **Describe the bug**
   某些网络情况下,master submit task时,无法进行netty通信,task信息发送不到worker,等待很长时间之后,出现time out的异常,然后过一段儿时间就又会出现这种现象。
   
   **To Reproduce**
   Steps to reproduce the behavior, for example:
   1. 手动运行某个流程
   2. 流程处于运行中,所有任务全部是已提交的灰色圆点状态
   3. master节点很长一段时间之后会出现timeout的异常
   4. worker端没有接受到master的信息
   
   **Expected behavior**
   在send方法中,获取channel的时候判断了channel的状态是否active,怀疑这里获取到的active 状态的channel并不能向worker发送数据,等待这个channel异常之后,重新创建的channel可以短暂使用,但是过一段儿时间还是会这样复现
   
   **Screenshots**
   公司环境截不了图
   
   
   **Which version of Dolphin Scheduler:**
    -[1.3.1]
    -[1.3.2]
   
   **Additional context**
   不同的网络环境可能结果不同,有朋友的测试集群没有出现异常,而生产出现异常。我个人的生产环境还没有上线进行测试,测试环境基本每半个小时左右可以出现一次
   
   **Requirement or improvement**
   - 希望尽快修复这个问题,严重影响调度
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] CalvinKirs commented on issue #3789: [Bug][remote] channel time out

Posted by GitBox <gi...@apache.org>.
CalvinKirs commented on issue #3789:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3789#issuecomment-696804370


   Hi, can you provide detailed stack information for this error?In addition, please confirm whether your network environment has changed during this period?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] Slowfever-star commented on issue #3789: [Bug][remote] channel time out

Posted by GitBox <gi...@apache.org>.
Slowfever-star commented on issue #3789:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3789#issuecomment-697076408


   I have also encountered similar problems, the error report is as follows
   ![Uploading 微信图片_20200923095731.png…]()
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] dailidong commented on issue #3789: [Bug][remote] channel time out

Posted by GitBox <gi...@apache.org>.
dailidong commented on issue #3789:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3789#issuecomment-698999334


   > hi, do you have any ideas on how to solve this problem?
   
   please referer: https://lists.apache.org/thread.html/rb3e3c5f09764bae74cdeef16ee12db0e751d463fd2aed2d011ad5c6e%40%3Cdev.dolphinscheduler.apache.org%3E


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] Slowfever-star commented on issue #3789: [Bug][remote] channel time out

Posted by GitBox <gi...@apache.org>.
Slowfever-star commented on issue #3789:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3789#issuecomment-697076408


   I have also encountered similar problems, the error report is as follows
   ![Uploading 微信图片_20200923095731.png…]()
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] nightxing commented on issue #3789: [Bug][remote] channel time out

Posted by GitBox <gi...@apache.org>.
nightxing commented on issue #3789:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3789#issuecomment-698153528






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] CalvinKirs commented on issue #3789: [Bug][remote] channel time out

Posted by GitBox <gi...@apache.org>.
CalvinKirs commented on issue #3789:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3789#issuecomment-698171929






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] qiaozhanwei commented on issue #3789: [Bug][remote] channel time out

Posted by GitBox <gi...@apache.org>.
qiaozhanwei commented on issue #3789:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3789#issuecomment-698233149


   I think this should be a network jitter problem . but from NettyRemotingClient.createChannel . I thik can modify  channelFuture.awaitUninterruptibly(this.nettyClientConfig.getConnectTimeoutMillis())
   
   我觉得这应该是网络抖动的问题 .  NettyRemotingClient.createChannel 中的future.sync() 可以修改为  channelFuture.awaitUninterruptibly(this.nettyClientConfig.getConnectTimeoutMillis())


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] dailidong closed issue #3789: [Bug][remote] channel time out

Posted by GitBox <gi...@apache.org>.
dailidong closed issue #3789:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3789


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] Slowfever-star edited a comment on issue #3789: [Bug][remote] channel time out

Posted by GitBox <gi...@apache.org>.
Slowfever-star edited a comment on issue #3789:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3789#issuecomment-697076408


   I have also encountered similar problems, the error report is as follows
   ![微信图片_20200923095731](https://user-images.githubusercontent.com/71742220/93956041-d0921880-fd83-11ea-9492-3542f0bf7c64.png)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] qiaozhanwei commented on issue #3789: [Bug][remote] channel time out

Posted by GitBox <gi...@apache.org>.
qiaozhanwei commented on issue #3789:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3789#issuecomment-698233149


   I think this should be a network jitter problem . but from NettyRemotingClient.createChannel . I thik can modify  channelFuture.awaitUninterruptibly(this.nettyClientConfig.getConnectTimeoutMillis())
   
   我觉得这应该是网络抖动的问题 .  NettyRemotingClient.createChannel 中的future.sync() 可以修改为  channelFuture.awaitUninterruptibly(this.nettyClientConfig.getConnectTimeoutMillis())


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] nightxing commented on issue #3789: [Bug][remote] channel time out

Posted by GitBox <gi...@apache.org>.
nightxing commented on issue #3789:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3789#issuecomment-698670220


   在ds中添加了一个20分钟运行一次的调度任务维持master到worker的通信,平稳运行一晚上了,不会再超时,感觉加个心跳机制完全可以解决。另外要说一下,worker节点执行完任务没有删除工作目录,删除目录的方法在master?这也是一个大问题!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] CalvinKirs commented on issue #3789: [Bug][remote] channel time out

Posted by GitBox <gi...@apache.org>.
CalvinKirs commented on issue #3789:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3789#issuecomment-696804370


   Hi, can you provide detailed stack information for this error?In addition, please confirm whether your network environment has changed during this period?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] CalvinKirs commented on issue #3789: [Bug][remote] channel time out

Posted by GitBox <gi...@apache.org>.
CalvinKirs commented on issue #3789:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3789#issuecomment-698694329


   Thank you so much for your feedback, I will finish it as soon as possible
   email discussion link:
   https://lists.apache.org/thread.html/rb3e3c5f09764bae74cdeef16ee12db0e751d463fd2aed2d011ad5c6e%40%3Cdev.dolphinscheduler.apache.org%3E


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] Slowfever-star edited a comment on issue #3789: [Bug][remote] channel time out

Posted by GitBox <gi...@apache.org>.
Slowfever-star edited a comment on issue #3789:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3789#issuecomment-697076408


   I have also encountered similar problems, the error report is as follows
   ![微信图片_20200923095731](https://user-images.githubusercontent.com/71742220/93956041-d0921880-fd83-11ea-9492-3542f0bf7c64.png)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] Slowfever-star commented on issue #3789: [Bug][remote] channel time out

Posted by GitBox <gi...@apache.org>.
Slowfever-star commented on issue #3789:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3789#issuecomment-698777885


   hi, do you have any ideas on how to solve this problem?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] Slowfever-star commented on issue #3789: [Bug][remote] channel time out

Posted by GitBox <gi...@apache.org>.
Slowfever-star commented on issue #3789:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3789#issuecomment-698777885


   hi, do you have any ideas on how to solve this problem?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] CalvinKirs commented on issue #3789: [Bug][remote] channel time out

Posted by GitBox <gi...@apache.org>.
CalvinKirs commented on issue #3789:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3789#issuecomment-698171929


   > 今天监测过网络,两台服务器之间网络没有断开,时不时就会timeout,很奇怪没有人报告这个问题。netty通信中存在心跳监测吗?能否通过心跳机制增加稳定性!
   
   Thank you very much for your feedback. We will send an email today to discuss related solutions for this issue. We will synchronize the issue at that time and we will resolve this issue as soon as possible.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] nightxing commented on issue #3789: [Bug][remote] channel time out

Posted by GitBox <gi...@apache.org>.
nightxing commented on issue #3789:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3789#issuecomment-698153528


   今天监测过网络,两台服务器之间网络没有断开,时不时就会timeout,很奇怪没有人报告这个问题。netty通信中存在心跳监测吗?能否通过心跳机制增加稳定性!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org