You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by wanghai <wh...@outlook.com> on 2016/04/21 09:41:38 UTC

kafk2.8.0-0.8.1.1 too many close_wait



Hello

         When
kafka cluster runs a period of time, I find the cluster stunk. Consumers can’t
read message from cluster.

         The
kafka cluster has 5 brokers, they are 0,131,132,133,134. the kafka version is 2.8.0-0.8.1.1



         I find a broker server 132 has too many
close_wait tcp, but other brokers haven’t close_wait. It still increments until
reaching “unix max open files”, and are killed as open too many files. 

         My “unix max open files” is 60000, I
think it is enough



tcp      
70      0 192.168.10.132:9092         192.168.10.131:34266        CLOSE_WAIT  17193/java          

tcp      
70      0 192.168.10.132:9092         192.168.10.134:58585        CLOSE_WAIT  17193/java          

tcp      
70      0 192.168.10.132:9092         192.168.10.134:56025        CLOSE_WAIT  17193/java          

tcp      
70      0 192.168.10.132:9092         192.168.10.131:50139        CLOSE_WAIT  17193/java          

tcp      
62      0 192.168.10.132:9092         192.168.10.131:49371        CLOSE_WAIT  17193/java          

tcp     
253      0
192.168.10.132:9092        
192.168.10.130:50909       
CLOSE_WAIT  17193/java          

tcp      
62      0 192.168.10.132:9092         192.168.10.134:50905        CLOSE_WAIT  17193/java          

tcp      
70      0 192.168.10.132:9092         192.168.10.134:50393        CLOSE_WAIT  17193/java          

tcp      
72      0 192.168.10.132:9092         192.168.10.130:47837        CLOSE_WAIT  17193/java          

tcp       70     
0 192.168.10.132:9092        
192.168.10.134:47321       
CLOSE_WAIT  17193/java          

tcp       
1      0 192.168.10.132:9092         192.168.10.134:46809        CLOSE_WAIT  17193/java 




 



 

         The
broker server 132 logs



[2016-04-20 01:09:48,736] INFO Closing socket connection to
/192.168.10.130. (kafka.network.Processor)

[2016-04-20 01:09:49,332] INFO Closing socket connection to
/192.168.10.130. (kafka.network.Processor)

[2016-04-20 01:09:51,523] ERROR Closing socket for /192.168.10.133 because
of error (kafka.network.Processor)

java.io.IOException: Connection reset by peer

         at
sun.nio.ch.FileDispatcher.read0(Native Method)

         at
sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)

         at
sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)

         at
sun.nio.ch.IOUtil.read(IOUtil.java:206)

         at
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)

         at
kafka.utils.Utils$.read(Utils.scala:375)

         at
kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)

         at
kafka.network.Processor.read(SocketServer.scala:347)

         at
kafka.network.Processor.run(SocketServer.scala:245)

         at
java.lang.Thread.run(Thread.java:619)

[2016-04-20 01:09:54,023] INFO Closing socket connection to
/192.168.10.134. (kafka.network.Processor)

[2016-04-20 01:09:56,285] INFO Closing socket connection to
/192.168.10.134. (kafka.network.Processor)

[2016-04-20 01:09:56,968] ERROR Closing socket for /192.168.10.133
because of error (kafka.network.Processor)

java.io.IOException: Broken pipe

         at
sun.nio.ch.FileDispatcher.write0(Native Method)

         at
sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)

         at
sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104)

         at
sun.nio.ch.IOUtil.write(IOUtil.java:75)

         at
sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)

         at
kafka.api.PartitionDataSend.writeTo(FetchResponse.scala:67)

         at
kafka.network.MultiSend.writeTo(Transmission.scala:102)

         at
kafka.api.TopicDataSend.writeTo(FetchResponse.scala:124)

         at
kafka.network.MultiSend.writeTo(Transmission.scala:102)

         at
kafka.api.FetchResponseSend.writeTo(FetchResponse.scala:219)

         at
kafka.network.Processor.write(SocketServer.scala:375)

         at
kafka.network.Processor.run(SocketServer.scala:247)

         at java.lang.Thread.run(Thread.java:619)

[2016-04-20 01:09:56,971] INFO Closing socket connection to
/192.168.10.130. (kafka.network.Processor)

[2016-04-20 01:09:57,328] INFO Closing socket connection to
/192.168.10.131. (kafka.network.Processor)

[2016-04-20 01:09:57,682] INFO Closing socket connection to
/192.168.10.133. (kafka.network.Processor)

[2016-04-20 01:09:57,683] ERROR Closing socket for /192.168.10.131
because of error (kafka.network.Processor)

java.io.IOException: Connection reset by peer

         at
sun.nio.ch.FileDispatcher.read0(Native Method)

         at
sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)

         at
sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)

         at
sun.nio.ch.IOUtil.read(IOUtil.java:206)

         at
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)

         at
kafka.utils.Utils$.read(Utils.scala:375)

         at
kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)

         at
kafka.network.Processor.read(SocketServer.scala:347)

         at
kafka.network.Processor.run(SocketServer.scala:245)

         at
java.lang.Thread.run(Thread.java:619)

[2016-04-20 01:09:57,748] INFO Closing socket connection to
/192.168.10.134. (kafka.network.Processor)

[2016-04-20 01:09:57,921] INFO Closing socket connection to
/192.168.10.133. (kafka.network.Processor)

[2016-04-20 01:09:58,099] INFO Closing socket connection to
/192.168.10.134. (kafka.network.Processor)

[2016-04-20 01:09:58,116] INFO Closing socket connection to
/192.168.10.131. (kafka.network.Processor)

[2016-04-20 01:09:58,163] INFO Closing socket connection to
/192.168.10.131. (kafka.network.Processor)

[2016-04-20 01:09:58,442] INFO Closing socket connection to
/192.168.10.134. (kafka.network.Processor)

[2016-04-20 01:09:58,541] INFO Closing socket connection to
/192.168.10.131. (kafka.network.Processor)

[2016-04-20 01:09:58,542] INFO Closing socket connection to
/192.168.10.130. (kafka.network.Processor)

[2016-04-20 01:09:58,740] INFO Closing socket connection to
/192.168.10.134. (kafka.network.Processor)

[2016-04-20 01:09:58,740] INFO Closing socket connection to
/192.168.10.131. (kafka.network.Processor)

[2016-04-20 01:09:58,915] INFO Closing socket connection to
/192.168.10.133. (kafka.network.Processor)

[2016-04-20 01:09:58,915] INFO Closing socket connection to
/192.168.10.134. (kafka.network.Processor)

[2016-04-20 01:09:58,916] INFO Closing socket connection to
/192.168.10.131. (kafka.network.Processor)

[2016-04-20 01:09:58,980] INFO Closing socket connection to
/192.168.10.133. (kafka.network.Processor)

[2016-04-20 01:09:58,980] INFO Closing socket connection to
/192.168.10.134. (kafka.network.Processor)

[2016-04-20 01:09:58,980] INFO Closing socket connection to
/192.168.10.133. (kafka.network.Processor)

[2016-04-20 01:09:59,115] ERROR Closing socket for /192.168.10.133
because of error (kafka.network.Processor)

java.io.IOException: Broken pipe

         at
sun.nio.ch.FileDispatcher.write0(Native Method)

         at
sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)

         at
sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104)

         at
sun.nio.ch.IOUtil.write(IOUtil.java:75)

         at
sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)

         at
kafka.api.PartitionDataSend.writeTo(FetchResponse.scala:67)

         at kafka.network.MultiSend.writeTo(Transmission.scala:102)

         at
kafka.api.TopicDataSend.writeTo(FetchResponse.scala:124)

         at
kafka.network.MultiSend.writeTo(Transmission.scala:102)

         at
kafka.api.FetchResponseSend.writeTo(FetchResponse.scala:219)

         at
kafka.network.Processor.write(SocketServer.scala:375)

         at
kafka.network.Processor.run(SocketServer.scala:247)

         at
java.lang.Thread.run(Thread.java:619)

[2016-04-20 01:09:59,115] INFO Closing socket connection to
/192.168.10.134. (kafka.network.Processor)

[2016-04-20 01:09:59,115] INFO Closing socket connection to
/192.168.10.131. (kafka.network.Processor)

[2016-04-20 01:09:59,329] INFO Closing socket connection to
/192.168.10.133. (kafka.network.Processor)

[2016-04-20 01:09:59,329] INFO Closing socket connection to
/192.168.10.134. (kafka.network.Processor)

[2016-04-20 01:09:59,329] INFO Closing socket connection to
/192.168.10.133. (kafka.network.Processor)

[2016-04-20 01:09:59,332] INFO Closing socket connection to
/192.168.10.131. (kafka.network.Processor)

[2016-04-20 01:13:43,821] INFO Partition [realtime_hardware,6] on
broker 132: Shrinking ISR for partition [realtime_hardware,6] from 132,134,131
to 132 (kafka.cluster.Partition)

[2016-04-20 01:13:43,822] INFO Partition [realtime_hardware_meta,9]
on broker 132: Shrinking ISR for partition [realtime_hardware_meta,9] from
132,133,131 to 132 (kafka.cluster.Partition)

[2016-04-20 01:13:43,823] INFO Partition [realtime_expansion,5] on
broker 132: Shrinking ISR for partition [realtime_expansion,5] from 132,133 to
132 (kafka.cluster.Partition)

[2016-04-20 01:13:43,824] INFO Partition [realtime_capacity,11] on
broker 132: Shrinking ISR for partition [realtime_capacity,11] from 132,134,131
to 132 (kafka.cluster.Partition)

[2016-04-20 01:13:43,825] INFO Partition [nginx_log,14] on broker 132:
Shrinking ISR for partition [nginx_log,14] from 132,133,131 to 132
(kafka.cluster.Partition)

[2016-04-20 01:13:43,825] INFO Partition [nginx_log,8] on broker
132: Shrinking ISR for partition [nginx_log,8] from 132,133,131 to 132
(kafka.cluster.Partition)

[2016-04-20 01:13:43,826] INFO Partition [realtime_heartbeat,12] on
broker 132: Shrinking ISR for partition [realtime_heartbeat,12] from
132,134,131 to 132 (kafka.cluster.Partition)



 

         So
I discard the borker 132,and restart kafka cluster. After 24 hours, the problem
appears again. It happens to 131. 

I don’t know how
to do. Please help me.

 

Best wishes!



 
 		 	   		  

Re: kafk2.8.0-0.8.1.1 too many close_wait

Posted by Manikumar Reddy <ma...@gmail.com>.
We have fixed similar issues in 0.8.2.0 release.  you should consider
moving to latest releases.

On Thu, Apr 21, 2016 at 1:11 PM, wanghai <wh...@outlook.com> wrote:

>
>
>
> Hello
>
>          When
> kafka cluster runs a period of time, I find the cluster stunk. Consumers
> can’t
> read message from cluster.
>
>          The
> kafka cluster has 5 brokers, they are 0,131,132,133,134. the kafka version
> is 2.8.0-0.8.1.1
>
>
>
>          I find a broker server 132 has too many
> close_wait tcp, but other brokers haven’t close_wait. It still increments
> until
> reaching “unix max open files”, and are killed as open too many files.
>
>          My “unix max open files” is 60000, I
> think it is enough
>
>
>
> tcp
> 70      0 192.168.10.132:9092         192.168.10.131:34266
> CLOSE_WAIT  17193/java
>
> tcp
> 70      0 192.168.10.132:9092         192.168.10.134:58585
> CLOSE_WAIT  17193/java
>
> tcp
> 70      0 192.168.10.132:9092         192.168.10.134:56025
> CLOSE_WAIT  17193/java
>
> tcp
> 70      0 192.168.10.132:9092         192.168.10.131:50139
> CLOSE_WAIT  17193/java
>
> tcp
> 62      0 192.168.10.132:9092         192.168.10.131:49371
> CLOSE_WAIT  17193/java
>
> tcp
> 253      0
> 192.168.10.132:9092
> 192.168.10.130:50909
> CLOSE_WAIT  17193/java
>
> tcp
> 62      0 192.168.10.132:9092         192.168.10.134:50905
> CLOSE_WAIT  17193/java
>
> tcp
> 70      0 192.168.10.132:9092         192.168.10.134:50393
> CLOSE_WAIT  17193/java
>
> tcp
> 72      0 192.168.10.132:9092         192.168.10.130:47837
> CLOSE_WAIT  17193/java
>
> tcp       70
> 0 192.168.10.132:9092
> 192.168.10.134:47321
> CLOSE_WAIT  17193/java
>
> tcp
> 1      0 192.168.10.132:9092         192.168.10.134:46809
> CLOSE_WAIT  17193/java
>
>
>
>
>
>
>
>
>
>
>          The
> broker server 132 logs
>
>
>
> [2016-04-20 01:09:48,736] INFO Closing socket connection to
> /192.168.10.130. (kafka.network.Processor)
>
> [2016-04-20 01:09:49,332] INFO Closing socket connection to
> /192.168.10.130. (kafka.network.Processor)
>
> [2016-04-20 01:09:51,523] ERROR Closing socket for /192.168.10.133 because
> of error (kafka.network.Processor)
>
> java.io.IOException: Connection reset by peer
>
>          at
> sun.nio.ch.FileDispatcher.read0(Native Method)
>
>          at
> sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
>
>          at
> sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
>
>          at
> sun.nio.ch.IOUtil.read(IOUtil.java:206)
>
>          at
> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
>
>          at
> kafka.utils.Utils$.read(Utils.scala:375)
>
>          at
>
> kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
>
>          at
> kafka.network.Processor.read(SocketServer.scala:347)
>
>          at
> kafka.network.Processor.run(SocketServer.scala:245)
>
>          at
> java.lang.Thread.run(Thread.java:619)
>
> [2016-04-20 01:09:54,023] INFO Closing socket connection to
> /192.168.10.134. (kafka.network.Processor)
>
> [2016-04-20 01:09:56,285] INFO Closing socket connection to
> /192.168.10.134. (kafka.network.Processor)
>
> [2016-04-20 01:09:56,968] ERROR Closing socket for /192.168.10.133
> because of error (kafka.network.Processor)
>
> java.io.IOException: Broken pipe
>
>          at
> sun.nio.ch.FileDispatcher.write0(Native Method)
>
>          at
> sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
>
>          at
> sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104)
>
>          at
> sun.nio.ch.IOUtil.write(IOUtil.java:75)
>
>          at
> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
>
>          at
> kafka.api.PartitionDataSend.writeTo(FetchResponse.scala:67)
>
>          at
> kafka.network.MultiSend.writeTo(Transmission.scala:102)
>
>          at
> kafka.api.TopicDataSend.writeTo(FetchResponse.scala:124)
>
>          at
> kafka.network.MultiSend.writeTo(Transmission.scala:102)
>
>          at
> kafka.api.FetchResponseSend.writeTo(FetchResponse.scala:219)
>
>          at
> kafka.network.Processor.write(SocketServer.scala:375)
>
>          at
> kafka.network.Processor.run(SocketServer.scala:247)
>
>          at java.lang.Thread.run(Thread.java:619)
>
> [2016-04-20 01:09:56,971] INFO Closing socket connection to
> /192.168.10.130. (kafka.network.Processor)
>
> [2016-04-20 01:09:57,328] INFO Closing socket connection to
> /192.168.10.131. (kafka.network.Processor)
>
> [2016-04-20 01:09:57,682] INFO Closing socket connection to
> /192.168.10.133. (kafka.network.Processor)
>
> [2016-04-20 01:09:57,683] ERROR Closing socket for /192.168.10.131
> because of error (kafka.network.Processor)
>
> java.io.IOException: Connection reset by peer
>
>          at
> sun.nio.ch.FileDispatcher.read0(Native Method)
>
>          at
> sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
>
>          at
> sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
>
>          at
> sun.nio.ch.IOUtil.read(IOUtil.java:206)
>
>          at
> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
>
>          at
> kafka.utils.Utils$.read(Utils.scala:375)
>
>          at
>
> kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
>
>          at
> kafka.network.Processor.read(SocketServer.scala:347)
>
>          at
> kafka.network.Processor.run(SocketServer.scala:245)
>
>          at
> java.lang.Thread.run(Thread.java:619)
>
> [2016-04-20 01:09:57,748] INFO Closing socket connection to
> /192.168.10.134. (kafka.network.Processor)
>
> [2016-04-20 01:09:57,921] INFO Closing socket connection to
> /192.168.10.133. (kafka.network.Processor)
>
> [2016-04-20 01:09:58,099] INFO Closing socket connection to
> /192.168.10.134. (kafka.network.Processor)
>
> [2016-04-20 01:09:58,116] INFO Closing socket connection to
> /192.168.10.131. (kafka.network.Processor)
>
> [2016-04-20 01:09:58,163] INFO Closing socket connection to
> /192.168.10.131. (kafka.network.Processor)
>
> [2016-04-20 01:09:58,442] INFO Closing socket connection to
> /192.168.10.134. (kafka.network.Processor)
>
> [2016-04-20 01:09:58,541] INFO Closing socket connection to
> /192.168.10.131. (kafka.network.Processor)
>
> [2016-04-20 01:09:58,542] INFO Closing socket connection to
> /192.168.10.130. (kafka.network.Processor)
>
> [2016-04-20 01:09:58,740] INFO Closing socket connection to
> /192.168.10.134. (kafka.network.Processor)
>
> [2016-04-20 01:09:58,740] INFO Closing socket connection to
> /192.168.10.131. (kafka.network.Processor)
>
> [2016-04-20 01:09:58,915] INFO Closing socket connection to
> /192.168.10.133. (kafka.network.Processor)
>
> [2016-04-20 01:09:58,915] INFO Closing socket connection to
> /192.168.10.134. (kafka.network.Processor)
>
> [2016-04-20 01:09:58,916] INFO Closing socket connection to
> /192.168.10.131. (kafka.network.Processor)
>
> [2016-04-20 01:09:58,980] INFO Closing socket connection to
> /192.168.10.133. (kafka.network.Processor)
>
> [2016-04-20 01:09:58,980] INFO Closing socket connection to
> /192.168.10.134. (kafka.network.Processor)
>
> [2016-04-20 01:09:58,980] INFO Closing socket connection to
> /192.168.10.133. (kafka.network.Processor)
>
> [2016-04-20 01:09:59,115] ERROR Closing socket for /192.168.10.133
> because of error (kafka.network.Processor)
>
> java.io.IOException: Broken pipe
>
>          at
> sun.nio.ch.FileDispatcher.write0(Native Method)
>
>          at
> sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
>
>          at
> sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104)
>
>          at
> sun.nio.ch.IOUtil.write(IOUtil.java:75)
>
>          at
> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
>
>          at
> kafka.api.PartitionDataSend.writeTo(FetchResponse.scala:67)
>
>          at kafka.network.MultiSend.writeTo(Transmission.scala:102)
>
>          at
> kafka.api.TopicDataSend.writeTo(FetchResponse.scala:124)
>
>          at
> kafka.network.MultiSend.writeTo(Transmission.scala:102)
>
>          at
> kafka.api.FetchResponseSend.writeTo(FetchResponse.scala:219)
>
>          at
> kafka.network.Processor.write(SocketServer.scala:375)
>
>          at
> kafka.network.Processor.run(SocketServer.scala:247)
>
>          at
> java.lang.Thread.run(Thread.java:619)
>
> [2016-04-20 01:09:59,115] INFO Closing socket connection to
> /192.168.10.134. (kafka.network.Processor)
>
> [2016-04-20 01:09:59,115] INFO Closing socket connection to
> /192.168.10.131. (kafka.network.Processor)
>
> [2016-04-20 01:09:59,329] INFO Closing socket connection to
> /192.168.10.133. (kafka.network.Processor)
>
> [2016-04-20 01:09:59,329] INFO Closing socket connection to
> /192.168.10.134. (kafka.network.Processor)
>
> [2016-04-20 01:09:59,329] INFO Closing socket connection to
> /192.168.10.133. (kafka.network.Processor)
>
> [2016-04-20 01:09:59,332] INFO Closing socket connection to
> /192.168.10.131. (kafka.network.Processor)
>
> [2016-04-20 01:13:43,821] INFO Partition [realtime_hardware,6] on
> broker 132: Shrinking ISR for partition [realtime_hardware,6] from
> 132,134,131
> to 132 (kafka.cluster.Partition)
>
> [2016-04-20 01:13:43,822] INFO Partition [realtime_hardware_meta,9]
> on broker 132: Shrinking ISR for partition [realtime_hardware_meta,9] from
> 132,133,131 to 132 (kafka.cluster.Partition)
>
> [2016-04-20 01:13:43,823] INFO Partition [realtime_expansion,5] on
> broker 132: Shrinking ISR for partition [realtime_expansion,5] from
> 132,133 to
> 132 (kafka.cluster.Partition)
>
> [2016-04-20 01:13:43,824] INFO Partition [realtime_capacity,11] on
> broker 132: Shrinking ISR for partition [realtime_capacity,11] from
> 132,134,131
> to 132 (kafka.cluster.Partition)
>
> [2016-04-20 01:13:43,825] INFO Partition [nginx_log,14] on broker 132:
> Shrinking ISR for partition [nginx_log,14] from 132,133,131 to 132
> (kafka.cluster.Partition)
>
> [2016-04-20 01:13:43,825] INFO Partition [nginx_log,8] on broker
> 132: Shrinking ISR for partition [nginx_log,8] from 132,133,131 to 132
> (kafka.cluster.Partition)
>
> [2016-04-20 01:13:43,826] INFO Partition [realtime_heartbeat,12] on
> broker 132: Shrinking ISR for partition [realtime_heartbeat,12] from
> 132,134,131 to 132 (kafka.cluster.Partition)
>
>
>
>
>
>          So
> I discard the borker 132,and restart kafka cluster. After 24 hours, the
> problem
> appears again. It happens to 131.
>
> I don’t know how
> to do. Please help me.
>
>
>
> Best wishes!
>
>
>
>
>