You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@rocketmq.apache.org by GitBox <gi...@apache.org> on 2020/11/13 08:43:43 UTC

[GitHub] [rocketmq] a619646645 opened a new issue #2427: Producer wait for broker response when send a message often shows RemotingTimeoutException

a619646645 opened a new issue #2427:
URL: https://github.com/apache/rocketmq/issues/2427


   As the title,when producer send messages to broker,shows the flowing WARN:
   `RemotingTimeoutException: wait response on the channel  timeout 3000ms.`
   
   We excluded network problems by network packet capture.
   
   When producer send A message ,the broker need 4000ms to respond, or even more.This has exceeded the maximum timeout.
   
   I'm sorry I can't go to the email community to discuss it because I'm a Chinese developer, And the community is for me is 404ERROR.
   
   NEED some help,PLEASE.
   Thank you Handsome guys.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [rocketmq] a619646645 commented on issue #2427: Producer wait for broker response when send a message often shows RemotingTimeoutException

Posted by GitBox <gi...@apache.org>.
a619646645 commented on issue #2427:
URL: https://github.com/apache/rocketmq/issues/2427#issuecomment-727159257


   > 需要查看一下rocketmqlogs下的日志,一些存储日志查看一下刷盘时间,或者查看下是否是同步数据慢了,还有查看一下是否有一些明显的错误日志
   
   您好,感谢您的解答,我在 store.log 发现了如下错误,请问下是这个原因引起的吗?
   ```
   2020-11-10 18:47:17 WARN ReadSocketService - ha housekeeping, found this connection[/10.8.130.121:58620] expired, 20020
   2020-11-10 18:47:17 INFO ReadSocketService - ReadSocketService service end
   2020-11-10 18:47:17 INFO AcceptSocketService - HAService receive new connection, /10.8.130.121:59356
   2020-11-10 18:47:17 INFO ReadSocketService - ReadSocketService service started
   2020-11-10 18:47:17 INFO WriteSocketService - WriteSocketService service started
   2020-11-10 18:47:17 INFO WriteSocketService - WriteSocketService service end
   2020-11-10 18:47:20 INFO StoreStatsService - [STORETPS] put_tps 0.6832877808146124 get_found_tps 0.6832877808146124 get_miss_tps 1.9832011199253383 get_transfered_tps 0.6832877808146124
   2020-11-10 18:47:20 INFO StoreStatsService - [PAGECACHERT] TotalPut 38, PutMessageDistributeTime [<=0ms]:15 [0~10ms]:23 [10~50ms]:0 [50~100ms]:0 [100~200ms]:0 [200~500ms]:0 [500ms~1s]:0 [1~2s]:0 [2~3s]:0 [3~4s]:0 [4~5s]:0 [5~10s]:0 [10s~]:0 
   2020-11-10 18:47:22 INFO ReadSocketService - slave[/10.8.130.121:59356] request offset 1280444295
   2020-11-10 18:47:22 INFO WriteSocketService - master transfer data from 1280444295 to slave[/10.8.130.121:59356], and slave request 1280444295
   2020-11-10 18:47:22 WARN SendMessageThread_5 - putMessage not in lock eclipse time(ms)=4401, bodyLength=520
   2020-11-10 18:47:22 WARN SendMessageThread_44 - putMessage not in lock eclipse time(ms)=4344, bodyLength=497
   2020-11-10 18:47:22 WARN SendMessageThread_35 - putMessage not in lock eclipse time(ms)=2453, bodyLength=539
   2020-11-10 18:47:22 WARN SendMessageThread_43 - putMessage not in lock eclipse time(ms)=3086, bodyLength=552
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [rocketmq] a619646645 commented on issue #2427: Producer wait for broker response when send a message often shows RemotingTimeoutException

Posted by GitBox <gi...@apache.org>.
a619646645 commented on issue #2427:
URL: https://github.com/apache/rocketmq/issues/2427#issuecomment-730199185


   这几天在写CURD,这个问题还是存在,
   #### store.log中[PAGECACHERT]的统计存在明显的刷盘超时:
   `2020-11-18 14:22:05 INFO StoreStatsService - [PAGECACHERT] TotalPut 50, PutMessageDistributeTime [<=0ms]:10 [0~10ms]:35 [10~50ms]:0 [50~100ms]:0 [100~200ms]:0 [200~500ms]:0 [500ms~1s]:1 [1~2s]:1 [2~3s]:1 [3~4s]:0 [4~5s]:2 [5~10s]:0 [10s~]:0 `
   
   #### 在每次发生这个现象的时候都会出现这些异常:
   我们的broker配置如下:
   ```
   brokerClusterName = DefaultCluster
   brokerName = broker-a
   brokerId = 0
   deleteWhen = 04
   fileReservedTime = 48
   brokerRole = ASYNC_MASTER
   flushDiskType = ASYNC_FLUSH
   ```
   
   看PAGECACHERT中有[2~3s]:1  [4~5s]:2这种明显的超时
   ```
   2020-11-18 14:21:05 INFO StoreStatsService - [PAGECACHERT] TotalPut 52, PutMessageDistributeTime [<=0ms]:9 [0~10ms]:43 [10~50ms]:0 [50~100ms]:0 [100~200ms]:0 [200~500ms]:0 [500ms~1s]:0 [1~2s]:0 [2~3s]:0 [3~4s]:0 [4~5s]:0 [5~10s]:0 [10s~]:0 
   2020-11-18 14:21:14 WARN ReadSocketService - ha housekeeping, found this connection[/slaveIP:47894] expired, 20019
   2020-11-18 14:21:14 INFO ReadSocketService - ReadSocketService service end
   2020-11-18 14:21:14 INFO AcceptSocketService - HAService receive new connection, /slaveIP:33400
   2020-11-18 14:21:14 INFO ReadSocketService - ReadSocketService service started
   2020-11-18 14:21:14 INFO WriteSocketService - WriteSocketService service started
   2020-11-18 14:21:14 INFO WriteSocketService - WriteSocketService service end
   2020-11-18 14:21:19 INFO ReadSocketService - slave[/slaveIP:33400] request offset 2235247317
   2020-11-18 14:21:19 INFO WriteSocketService - master transfer data from 2235247317 to slave[/slaveIP:33400], and slave request 2235247317
   2020-11-18 14:21:19 WARN SendMessageThread_40 - putMessage not in lock eclipse time(ms)=4370, bodyLength=492
   2020-11-18 14:21:19 WARN SendMessageThread_37 - putMessage not in lock eclipse time(ms)=4056, bodyLength=466
   2020-11-18 14:21:19 WARN SendMessageThread_34 - putMessage not in lock eclipse time(ms)=2280, bodyLength=459
   2020-11-18 14:21:19 WARN SendMessageThread_10 - putMessage not in lock eclipse time(ms)=1081, bodyLength=523
   2020-11-18 14:21:19 WARN SendMessageThread_28 - putMessage not in lock eclipse time(ms)=719, bodyLength=444
   2020-11-18 14:22:05 INFO StoreStatsService - [STORETPS] put_tps 0.8332777814812345 get_found_tps 0.8332777814812345 get_miss_tps 4.466368908739417 get_transfered_tps 0.8332777814812345
   2020-11-18 14:22:05 INFO StoreStatsService - [PAGECACHERT] TotalPut 50, PutMessageDistributeTime [<=0ms]:10 [0~10ms]:35 [10~50ms]:0 [50~100ms]:0 [100~200ms]:0 [200~500ms]:0 [500ms~1s]:1 [1~2s]:1 [2~3s]:1 [3~4s]:0 [4~5s]:2 [5~10s]:0 [10s~]:0 
   ```
   
   #### GC的情况:
   查看了GC的日志,发现GC是没有什么问题的,GC时间都很短而且在RemotingTimeoutException异常发生时没有出现GC
   
   认真研读了Commitlog#putmessage(msg) 的逻辑,同时观察日志没有出现明显的异常,现在似乎是卡在这里了,是不是我们的内存配置,或者是主从交换的配置有什么问题,看到有一个issue:[issues428](https://github.com/apache/rocketmq/issues/428)的情况跟我一样,是有关于心跳的问题吗?
   
   @RongtongJin 期待您的回答,万分感谢您!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [rocketmq] a619646645 commented on issue #2427: Producer wait for broker response when send a message often shows RemotingTimeoutException

Posted by GitBox <gi...@apache.org>.
a619646645 commented on issue #2427:
URL: https://github.com/apache/rocketmq/issues/2427#issuecomment-727162408


   我们现在是双主双从 8C32G的配置,客户端服务端版本都是4.2.0,两个broker都存在这个问题。
   同时,通过观察业务端日志,我还发现一个现象:
   在出现这个问题之前,某一个broker在大约20S内都没有接收到消息,例如broker-a出现了RemotingTimeoutException,那么在broker-a之前的20s内,他都是没有收到消息的。
   
   @RongtongJin 您能进一步给出排障意见吗?非常感谢!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [rocketmq] a619646645 commented on issue #2427: Producer wait for broker response when send a message often shows RemotingTimeoutException

Posted by GitBox <gi...@apache.org>.
a619646645 commented on issue #2427:
URL: https://github.com/apache/rocketmq/issues/2427#issuecomment-726638897


   Waiting online. It's urgent
   在线等,挺急的


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [rocketmq] RongtongJin edited a comment on issue #2427: Producer wait for broker response when send a message often shows RemotingTimeoutException

Posted by GitBox <gi...@apache.org>.
RongtongJin edited a comment on issue #2427:
URL: https://github.com/apache/rocketmq/issues/2427#issuecomment-726666197


   需要查看一下rocketmqlogs下的日志,比如store日志查看一下刷盘时间,或者查看下是否是同步数据慢了,还有查看一下是否有一些明显的error日志


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [rocketmq] RongtongJin commented on issue #2427: Producer wait for broker response when send a message often shows RemotingTimeoutException

Posted by GitBox <gi...@apache.org>.
RongtongJin commented on issue #2427:
URL: https://github.com/apache/rocketmq/issues/2427#issuecomment-731016501


   > 这几天在写CURD,这个问题还是存在,
   > 
   > #### store.log中[PAGECACHERT]的统计存在明显的刷盘超时:
   > `2020-11-18 14:22:05 INFO StoreStatsService - [PAGECACHERT] TotalPut 50, PutMessageDistributeTime [<=0ms]:10 [0~10ms]:35 [10~50ms]:0 [50~100ms]:0 [100~200ms]:0 [200~500ms]:0 [500ms~1s]:1 [1~2s]:1 [2~3s]:1 [3~4s]:0 [4~5s]:2 [5~10s]:0 [10s~]:0 `
   > 
   > #### 在每次发生这个现象的时候都会出现这些异常:
   > 我们的broker配置如下:
   > 
   > ```
   > brokerClusterName = DefaultCluster
   > brokerName = broker-a
   > brokerId = 0
   > deleteWhen = 04
   > fileReservedTime = 48
   > brokerRole = ASYNC_MASTER
   > flushDiskType = ASYNC_FLUSH
   > ```
   > 
   > 看PAGECACHERT中有 2--3s:1、4--5s:2 这种明显的超时
   > 
   > ```
   > 2020-11-18 14:21:05 INFO StoreStatsService - [PAGECACHERT] TotalPut 52, PutMessageDistributeTime [<=0ms]:9 [0~10ms]:43 [10~50ms]:0 [50~100ms]:0 [100~200ms]:0 [200~500ms]:0 [500ms~1s]:0 [1~2s]:0 [2~3s]:0 [3~4s]:0 [4~5s]:0 [5~10s]:0 [10s~]:0 
   > 2020-11-18 14:21:14 WARN ReadSocketService - ha housekeeping, found this connection[/slaveIP:47894] expired, 20019
   > 2020-11-18 14:21:14 INFO ReadSocketService - ReadSocketService service end
   > 2020-11-18 14:21:14 INFO AcceptSocketService - HAService receive new connection, /slaveIP:33400
   > 2020-11-18 14:21:14 INFO ReadSocketService - ReadSocketService service started
   > 2020-11-18 14:21:14 INFO WriteSocketService - WriteSocketService service started
   > 2020-11-18 14:21:14 INFO WriteSocketService - WriteSocketService service end
   > 2020-11-18 14:21:19 INFO ReadSocketService - slave[/slaveIP:33400] request offset 2235247317
   > 2020-11-18 14:21:19 INFO WriteSocketService - master transfer data from 2235247317 to slave[/slaveIP:33400], and slave request 2235247317
   > 2020-11-18 14:21:19 WARN SendMessageThread_40 - putMessage not in lock eclipse time(ms)=4370, bodyLength=492
   > 2020-11-18 14:21:19 WARN SendMessageThread_37 - putMessage not in lock eclipse time(ms)=4056, bodyLength=466
   > 2020-11-18 14:21:19 WARN SendMessageThread_34 - putMessage not in lock eclipse time(ms)=2280, bodyLength=459
   > 2020-11-18 14:21:19 WARN SendMessageThread_10 - putMessage not in lock eclipse time(ms)=1081, bodyLength=523
   > 2020-11-18 14:21:19 WARN SendMessageThread_28 - putMessage not in lock eclipse time(ms)=719, bodyLength=444
   > 2020-11-18 14:22:05 INFO StoreStatsService - [STORETPS] put_tps 0.8332777814812345 get_found_tps 0.8332777814812345 get_miss_tps 4.466368908739417 get_transfered_tps 0.8332777814812345
   > 2020-11-18 14:22:05 INFO StoreStatsService - [PAGECACHERT] TotalPut 50, PutMessageDistributeTime [<=0ms]:10 [0~10ms]:35 [10~50ms]:0 [50~100ms]:0 [100~200ms]:0 [200~500ms]:0 [500ms~1s]:1 [1~2s]:1 [2~3s]:1 [3~4s]:0 [4~5s]:2 [5~10s]:0 [10s~]:0 
   > ```
   > 
   > #### GC的情况:
   > 查看了GC的日志,发现GC是没有什么问题的,GC时间都很短而且在RemotingTimeoutException异常发生时没有出现GC
   > 
   > 认真研读了Commitlog#putmessage(msg) 的逻辑,同时观察日志没有出现明显的异常,现在似乎是卡在这里了,是不是我们的内存配置,或者是主从交换的配置有什么问题,看到有一个issue:[issues428](https://github.com/apache/rocketmq/issues/428)的情况跟我一样,是有关于心跳的问题吗?
   > 
   > @RongtongJin 期待您的回答,万分感谢您!
   
    那就看看是不是磁盘的问题,应该刷盘慢的时间段await, io util也很高,但看起来业务量不大。配置看起来设置比较简单,MQ这边可以调优一下线程数sendMessageThreadPoolNums,开启transientStorePoolEnable减少刷盘毛刺。可以执行下distribution/bin/os.sh脚本优化一下操作系统参数,特别是一些脏页writeback参数、IO请求调度算法等,另外尽量不要使用swap分区,需要通过工具来复现找出具体原因。
   
   和issue428应该没有什么关系,刷盘慢是原因,timeout是结果。


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [rocketmq] a619646645 commented on issue #2427: Producer wait for broker response when send a message often shows RemotingTimeoutException

Posted by GitBox <gi...@apache.org>.
a619646645 commented on issue #2427:
URL: https://github.com/apache/rocketmq/issues/2427#issuecomment-726619087


   By the way,this will not happen at the peak of our operations


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [rocketmq] a619646645 edited a comment on issue #2427: Producer wait for broker response when send a message often shows RemotingTimeoutException

Posted by GitBox <gi...@apache.org>.
a619646645 edited a comment on issue #2427:
URL: https://github.com/apache/rocketmq/issues/2427#issuecomment-730199185


   这几天在写CURD,这个问题还是存在,
   #### store.log中[PAGECACHERT]的统计存在明显的刷盘超时:
   `2020-11-18 14:22:05 INFO StoreStatsService - [PAGECACHERT] TotalPut 50, PutMessageDistributeTime [<=0ms]:10 [0~10ms]:35 [10~50ms]:0 [50~100ms]:0 [100~200ms]:0 [200~500ms]:0 [500ms~1s]:1 [1~2s]:1 [2~3s]:1 [3~4s]:0 [4~5s]:2 [5~10s]:0 [10s~]:0 `
   
   #### 在每次发生这个现象的时候都会出现这些异常:
   我们的broker配置如下:
   ```
   brokerClusterName = DefaultCluster
   brokerName = broker-a
   brokerId = 0
   deleteWhen = 04
   fileReservedTime = 48
   brokerRole = ASYNC_MASTER
   flushDiskType = ASYNC_FLUSH
   ```
   
   看PAGECACHERT中有 2--3s:1、4--5s:2   这种明显的超时
   ```
   2020-11-18 14:21:05 INFO StoreStatsService - [PAGECACHERT] TotalPut 52, PutMessageDistributeTime [<=0ms]:9 [0~10ms]:43 [10~50ms]:0 [50~100ms]:0 [100~200ms]:0 [200~500ms]:0 [500ms~1s]:0 [1~2s]:0 [2~3s]:0 [3~4s]:0 [4~5s]:0 [5~10s]:0 [10s~]:0 
   2020-11-18 14:21:14 WARN ReadSocketService - ha housekeeping, found this connection[/slaveIP:47894] expired, 20019
   2020-11-18 14:21:14 INFO ReadSocketService - ReadSocketService service end
   2020-11-18 14:21:14 INFO AcceptSocketService - HAService receive new connection, /slaveIP:33400
   2020-11-18 14:21:14 INFO ReadSocketService - ReadSocketService service started
   2020-11-18 14:21:14 INFO WriteSocketService - WriteSocketService service started
   2020-11-18 14:21:14 INFO WriteSocketService - WriteSocketService service end
   2020-11-18 14:21:19 INFO ReadSocketService - slave[/slaveIP:33400] request offset 2235247317
   2020-11-18 14:21:19 INFO WriteSocketService - master transfer data from 2235247317 to slave[/slaveIP:33400], and slave request 2235247317
   2020-11-18 14:21:19 WARN SendMessageThread_40 - putMessage not in lock eclipse time(ms)=4370, bodyLength=492
   2020-11-18 14:21:19 WARN SendMessageThread_37 - putMessage not in lock eclipse time(ms)=4056, bodyLength=466
   2020-11-18 14:21:19 WARN SendMessageThread_34 - putMessage not in lock eclipse time(ms)=2280, bodyLength=459
   2020-11-18 14:21:19 WARN SendMessageThread_10 - putMessage not in lock eclipse time(ms)=1081, bodyLength=523
   2020-11-18 14:21:19 WARN SendMessageThread_28 - putMessage not in lock eclipse time(ms)=719, bodyLength=444
   2020-11-18 14:22:05 INFO StoreStatsService - [STORETPS] put_tps 0.8332777814812345 get_found_tps 0.8332777814812345 get_miss_tps 4.466368908739417 get_transfered_tps 0.8332777814812345
   2020-11-18 14:22:05 INFO StoreStatsService - [PAGECACHERT] TotalPut 50, PutMessageDistributeTime [<=0ms]:10 [0~10ms]:35 [10~50ms]:0 [50~100ms]:0 [100~200ms]:0 [200~500ms]:0 [500ms~1s]:1 [1~2s]:1 [2~3s]:1 [3~4s]:0 [4~5s]:2 [5~10s]:0 [10s~]:0 
   ```
   
   #### GC的情况:
   查看了GC的日志,发现GC是没有什么问题的,GC时间都很短而且在RemotingTimeoutException异常发生时没有出现GC
   
   认真研读了Commitlog#putmessage(msg) 的逻辑,同时观察日志没有出现明显的异常,现在似乎是卡在这里了,是不是我们的内存配置,或者是主从交换的配置有什么问题,看到有一个issue:[issues428](https://github.com/apache/rocketmq/issues/428)的情况跟我一样,是有关于心跳的问题吗?
   
   @RongtongJin 期待您的回答,万分感谢您!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [rocketmq] a619646645 edited a comment on issue #2427: Producer wait for broker response when send a message often shows RemotingTimeoutException

Posted by GitBox <gi...@apache.org>.
a619646645 edited a comment on issue #2427:
URL: https://github.com/apache/rocketmq/issues/2427#issuecomment-727159257


   > 需要查看一下rocketmqlogs下的日志,一些存储日志查看一下刷盘时间,或者查看下是否是同步数据慢了,还有查看一下是否有一些明显的错误日志
   
   您好,感谢您的解答,我在 store.log 发现了如下错误,请问下是这个原因引起的吗?IP我用slaveIP代替了
   ```
   2020-11-10 18:47:17 WARN ReadSocketService - ha housekeeping, found this connection[/slaveIP:58620] expired, 20020
   2020-11-10 18:47:17 INFO ReadSocketService - ReadSocketService service end
   2020-11-10 18:47:17 INFO AcceptSocketService - HAService receive new connection, /slaveIP:59356
   2020-11-10 18:47:17 INFO ReadSocketService - ReadSocketService service started
   2020-11-10 18:47:17 INFO WriteSocketService - WriteSocketService service started
   2020-11-10 18:47:17 INFO WriteSocketService - WriteSocketService service end
   2020-11-10 18:47:20 INFO StoreStatsService - [STORETPS] put_tps 0.6832877808146124 get_found_tps 0.6832877808146124 get_miss_tps 1.9832011199253383 get_transfered_tps 0.6832877808146124
   2020-11-10 18:47:20 INFO StoreStatsService - [PAGECACHERT] TotalPut 38, PutMessageDistributeTime [<=0ms]:15 [0~10ms]:23 [10~50ms]:0 [50~100ms]:0 [100~200ms]:0 [200~500ms]:0 [500ms~1s]:0 [1~2s]:0 [2~3s]:0 [3~4s]:0 [4~5s]:0 [5~10s]:0 [10s~]:0 
   2020-11-10 18:47:22 INFO ReadSocketService - slave[/10.8.130.121:59356] request offset 1280444295
   2020-11-10 18:47:22 INFO WriteSocketService - master transfer data from 1280444295 to slave[/10.8.130.121:59356], and slave request 1280444295
   2020-11-10 18:47:22 WARN SendMessageThread_5 - putMessage not in lock eclipse time(ms)=4401, bodyLength=520
   2020-11-10 18:47:22 WARN SendMessageThread_44 - putMessage not in lock eclipse time(ms)=4344, bodyLength=497
   2020-11-10 18:47:22 WARN SendMessageThread_35 - putMessage not in lock eclipse time(ms)=2453, bodyLength=539
   2020-11-10 18:47:22 WARN SendMessageThread_43 - putMessage not in lock eclipse time(ms)=3086, bodyLength=552
   ```
   
   同时也在broker.log发现如下错误:
   IP 用nameServerIp代替
   ```
   2020-10-15 15:14:22 WARN main - registerBroker Exception, nameServerIp:9876
   org.apache.rocketmq.remoting.exception.RemotingConnectException: connect to <nameServerIp:9876> failed
   	at org.apache.rocketmq.remoting.netty.NettyRemotingClient.invokeSync(NettyRemotingClient.java:388) ~[rocketmq-remoting-4.2.0-frxs.jar:4.2.0-frxs]
   	at org.apache.rocketmq.broker.out.BrokerOuterAPI.registerBroker(BrokerOuterAPI.java:167) ~[rocketmq-broker-4.2.0-frxs.jar:4.2.0-frxs]
   	at org.apache.rocketmq.broker.out.BrokerOuterAPI.registerBrokerAll(BrokerOuterAPI.java:116) ~[rocketmq-broker-4.2.0-frxs.jar:4.2.0-frxs]
   	at org.apache.rocketmq.broker.BrokerController.registerBrokerAll(BrokerController.java:719) [rocketmq-broker-4.2.0-frxs.jar:4.2.0-frxs]
   	at org.apache.rocketmq.broker.BrokerController.start(BrokerController.java:681) [rocketmq-broker-4.2.0-frxs.jar:4.2.0-frxs]
   	at org.apache.rocketmq.broker.BrokerStartup.start(BrokerStartup.java:62) [rocketmq-broker-4.2.0-frxs.jar:4.2.0-frxs]
   	at org.apache.rocketmq.broker.BrokerStartup.main(BrokerStartup.java:56) [rocketmq-broker-4.2.0-frxs.jar:4.2.0-frxs]
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [rocketmq] RongtongJin commented on issue #2427: Producer wait for broker response when send a message often shows RemotingTimeoutException

Posted by GitBox <gi...@apache.org>.
RongtongJin commented on issue #2427:
URL: https://github.com/apache/rocketmq/issues/2427#issuecomment-732151858


   I will close the Issue first, but please feel free to reopen it if you have any other issues.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [rocketmq] RongtongJin commented on issue #2427: Producer wait for broker response when send a message often shows RemotingTimeoutException

Posted by GitBox <gi...@apache.org>.
RongtongJin commented on issue #2427:
URL: https://github.com/apache/rocketmq/issues/2427#issuecomment-726666197


   需要查看一下rocketmqlogs下的日志,比如store日志查看一下刷盘时间,或者查看下是否是同步数据慢了


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [rocketmq] a619646645 edited a comment on issue #2427: Producer wait for broker response when send a message often shows RemotingTimeoutException

Posted by GitBox <gi...@apache.org>.
a619646645 edited a comment on issue #2427:
URL: https://github.com/apache/rocketmq/issues/2427#issuecomment-730199185


   这几天在写CURD,这个问题还是存在,
   #### store.log中[PAGECACHERT]的统计存在明显的刷盘超时:
   `2020-11-18 14:22:05 INFO StoreStatsService - [PAGECACHERT] TotalPut 50, PutMessageDistributeTime [<=0ms]:10 [0~10ms]:35 [10~50ms]:0 [50~100ms]:0 [100~200ms]:0 [200~500ms]:0 [500ms~1s]:1 [1~2s]:1 [2~3s]:1 [3~4s]:0 [4~5s]:2 [5~10s]:0 [10s~]:0 `
   
   #### 在每次发生这个现象的时候都会出现这些异常:
   我们的broker配置如下:
   ```
   brokerClusterName = DefaultCluster
   brokerName = broker-a
   brokerId = 0
   deleteWhen = 04
   fileReservedTime = 48
   brokerRole = ASYNC_MASTER
   flushDiskType = ASYNC_FLUSH
   ```
   
   看PAGECACHERT中有2~3s:1、4~5s:2这种明显的超时
   ```
   2020-11-18 14:21:05 INFO StoreStatsService - [PAGECACHERT] TotalPut 52, PutMessageDistributeTime [<=0ms]:9 [0~10ms]:43 [10~50ms]:0 [50~100ms]:0 [100~200ms]:0 [200~500ms]:0 [500ms~1s]:0 [1~2s]:0 [2~3s]:0 [3~4s]:0 [4~5s]:0 [5~10s]:0 [10s~]:0 
   2020-11-18 14:21:14 WARN ReadSocketService - ha housekeeping, found this connection[/slaveIP:47894] expired, 20019
   2020-11-18 14:21:14 INFO ReadSocketService - ReadSocketService service end
   2020-11-18 14:21:14 INFO AcceptSocketService - HAService receive new connection, /slaveIP:33400
   2020-11-18 14:21:14 INFO ReadSocketService - ReadSocketService service started
   2020-11-18 14:21:14 INFO WriteSocketService - WriteSocketService service started
   2020-11-18 14:21:14 INFO WriteSocketService - WriteSocketService service end
   2020-11-18 14:21:19 INFO ReadSocketService - slave[/slaveIP:33400] request offset 2235247317
   2020-11-18 14:21:19 INFO WriteSocketService - master transfer data from 2235247317 to slave[/slaveIP:33400], and slave request 2235247317
   2020-11-18 14:21:19 WARN SendMessageThread_40 - putMessage not in lock eclipse time(ms)=4370, bodyLength=492
   2020-11-18 14:21:19 WARN SendMessageThread_37 - putMessage not in lock eclipse time(ms)=4056, bodyLength=466
   2020-11-18 14:21:19 WARN SendMessageThread_34 - putMessage not in lock eclipse time(ms)=2280, bodyLength=459
   2020-11-18 14:21:19 WARN SendMessageThread_10 - putMessage not in lock eclipse time(ms)=1081, bodyLength=523
   2020-11-18 14:21:19 WARN SendMessageThread_28 - putMessage not in lock eclipse time(ms)=719, bodyLength=444
   2020-11-18 14:22:05 INFO StoreStatsService - [STORETPS] put_tps 0.8332777814812345 get_found_tps 0.8332777814812345 get_miss_tps 4.466368908739417 get_transfered_tps 0.8332777814812345
   2020-11-18 14:22:05 INFO StoreStatsService - [PAGECACHERT] TotalPut 50, PutMessageDistributeTime [<=0ms]:10 [0~10ms]:35 [10~50ms]:0 [50~100ms]:0 [100~200ms]:0 [200~500ms]:0 [500ms~1s]:1 [1~2s]:1 [2~3s]:1 [3~4s]:0 [4~5s]:2 [5~10s]:0 [10s~]:0 
   ```
   
   #### GC的情况:
   查看了GC的日志,发现GC是没有什么问题的,GC时间都很短而且在RemotingTimeoutException异常发生时没有出现GC
   
   认真研读了Commitlog#putmessage(msg) 的逻辑,同时观察日志没有出现明显的异常,现在似乎是卡在这里了,是不是我们的内存配置,或者是主从交换的配置有什么问题,看到有一个issue:[issues428](https://github.com/apache/rocketmq/issues/428)的情况跟我一样,是有关于心跳的问题吗?
   
   @RongtongJin 期待您的回答,万分感谢您!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [rocketmq] RongtongJin commented on issue #2427: Producer wait for broker response when send a message often shows RemotingTimeoutException

Posted by GitBox <gi...@apache.org>.
RongtongJin commented on issue #2427:
URL: https://github.com/apache/rocketmq/issues/2427#issuecomment-727212663


   > putMessage not in lock eclipse
   
   2020-11-10 18:47:22 WARN SendMessageThread_5 - putMessage not in lock eclipse time(ms)=4401, bodyLength=520
   2020-11-10 18:47:22 WARN SendMessageThread_44 - putMessage not in lock eclipse time(ms)=4344, bodyLength=497
   2020-11-10 18:47:22 WARN SendMessageThread_35 - putMessage not in lock eclipse time(ms)=2453, bodyLength=539
   2020-11-10 18:47:22 WARN SendMessageThread_43 - putMessage not in lock eclipse time(ms)=3086, bodyLength=552
   
   这些日志显示某些请求的putMessage时间超过了4s,但从当前日志无法判定是刷盘慢了,还是同步数据数据慢了,又或者是发生了full gc, 可以排查一下broker的gc日志,或者看看store.log中[PAGECACHERT]的统计是否有明显刷盘超时等


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [rocketmq] RongtongJin closed issue #2427: Producer wait for broker response when send a message often shows RemotingTimeoutException

Posted by GitBox <gi...@apache.org>.
RongtongJin closed issue #2427:
URL: https://github.com/apache/rocketmq/issues/2427


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org