You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@zookeeper.apache.org by "hejie (Jira)" <ji...@apache.org> on 2022/10/13 12:24:00 UTC

[jira] [Updated] (ZOOKEEPER-4623) zookeeper oom

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-4623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

hejie updated ZOOKEEPER-4623:
-----------------------------
    Description: 
ZooKeeper is deployed on three different nodes to form a cluster. The disk capacity of two nodes is fully occupied and files cannot be written. The memory of the third node overflows.

 

it seem to SSLSessionImpl has a large number of objects, causing memory overflow.

 

 

Large Object:

sun.security.ssl.SSLSocketImpl @ 0xe541fce8
  sun.security.ssl.TransportContext @ 0xe541ff78
    sun.security.ssl.SSLSessionImpl @ 0xe5427d20
      java.util.concurrent.ConcurrentLinkedQueue @ 0xe5427ec0
        java.util.concurrent.ConcurrentLinkedQueue$Node @ 0xe5427ed8
        java.util.concurrent.ConcurrentLinkedQueue$Node @ 0xe5427ef0
        sun.security.ssl.SSLSessionImpl @ 0xe5427f08
        java.util.concurrent.ConcurrentLinkedQueue @ 0xe54280a8
          java.util.concurrent.ConcurrentLinkedQueue$Node @ 0xe54280c0
            java.util.concurrent.ConcurrentLinkedQueue$Node @ 0xe54280d8
              sun.security.ssl.SSLSessionImpl @ 0xe54280f0
                 java.util.concurrent.ConcurrentLinkedQueue @ 0xe54281f8
                    java.util.concurrent.ConcurrentLinkedQueue$Node @ 0xe5428210

 

 

Thread:

 

org.apache.zookeeper.server.quorum.LearnerHandler#run

org.apache.jute.BinaryInputArchive#readRecord

xxx

org.apache.zookeeper.server.quorum.UnifiedServerSocket.UnifiedSocket#detectMode

xxxx

LOG.info("Accepted TLS connection from {} - {} - {}",sslSocket.getRemoteSocketAddress(),sslSocket.getSession().getProtocol(),sslSocket.getSession().getCipherSuite());

xxxx

sun.security.ssl.SSLSessionImpl#addChild

 

 

 

 

 

log:

 

09-27 00:45:11,708 INFO (WorkerReceiver[myid=2]) Notification: my state:LEADING; n.sid:0, n.state:LOOKING, n.leader:0, n.round:0x2, n.peerEpoch:0x1, n.zxid:0x10000015f, message format version:0x2, n.config version:0x0 (FastLeaderElection$Messenger$WorkerReceiver:389) 
09-27 00:45:11,719 INFO (LearnerHandler-/90.33.116.214:59826) Accepted TLS connection from /90.33.116.214:59826 - TLSv1.3 - TLS_AES_128_GCM_SHA256 (UnifiedServerSocket$UnifiedSocket:266) 
09-27 00:45:11,719 INFO (LearnerHandler-/90.33.116.214:59826) Follower sid: 0 : info : 90.33.116.214:2878:3878:participant (LearnerHandler:504) 
09-27 00:45:11,719 INFO (LearnerHandler-/90.33.116.214:59826) On disk txn sync enabled with snapshotSizeFactor 0.33 (ZKDatabase:345) 
09-27 00:45:11,719 INFO (LearnerHandler-/90.33.116.214:59826) Synchronizing with Learner sid: 0 maxCommittedLog=0x100000163 minCommittedLog=0x100000001 lastProcessedZxid=0x100000163 peerLastZxid=0x10000015f (LearnerHandler:807) 
09-27 00:45:11,719 INFO (LearnerHandler-/90.33.116.214:59826) Using committedLog for peer sid: 0 (LearnerHandler:871) 
09-27 00:45:11,719 INFO (LearnerHandler-/90.33.116.214:59826) Sending DIFF zxid=0x100000163  for peer sid: 0 (LearnerHandler:979) 
09-27 00:45:11,721 ERROR(LearnerHandler-/90.33.116.214:59826) Unexpected exception causing shutdown while sock still open (LearnerHandler:714) 
java.io.EOFException: null
    at java.io.DataInputStream.readInt(DataInputStream.java:393) ~[?:1.8.0_342]
    at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:96) ~[zookeeper-jute-3.6.3.jar:3.6.3]
    at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:86) ~[zookeeper-jute-3.6.3.jar:3.6.3]
    at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:134) ~[zookeeper-jute-3.6.3.jar:3.6.3]
    at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:611) [zookeeper-3.6.3-h0.gdd.pub.r286.jar:3.6.3]
09-27 00:45:11,721 WARN (LearnerHandler-/90.33.116.214:59826) ******* GOODBYE /90.33.116.214:59826 ******** (LearnerHandler:737) 

 

 

how can i fix the problem

  was:
ZooKeeper is deployed on three different nodes to form a cluster. The disk capacity of two nodes is fully occupied and files cannot be written. The memory of the third node overflows.

 

how can i fix the problem

 

thread

 

 

 

Large Object

sun.security.ssl.SSLSocketImpl @ 0xe541fce8
  sun.security.ssl.TransportContext @ 0xe541ff78
    sun.security.ssl.SSLSessionImpl @ 0xe5427d20
      java.util.concurrent.ConcurrentLinkedQueue @ 0xe5427ec0
        java.util.concurrent.ConcurrentLinkedQueue$Node @ 0xe5427ed8
        java.util.concurrent.ConcurrentLinkedQueue$Node @ 0xe5427ef0
        sun.security.ssl.SSLSessionImpl @ 0xe5427f08
        java.util.concurrent.ConcurrentLinkedQueue @ 0xe54280a8
          java.util.concurrent.ConcurrentLinkedQueue$Node @ 0xe54280c0
            java.util.concurrent.ConcurrentLinkedQueue$Node @ 0xe54280d8
              sun.security.ssl.SSLSessionImpl @ 0xe54280f0
                 java.util.concurrent.ConcurrentLinkedQueue @ 0xe54281f8
                    java.util.concurrent.ConcurrentLinkedQueue$Node @ 0xe5428210

 

 

log

 

09-27 00:45:11,708 INFO (WorkerReceiver[myid=2]) Notification: my state:LEADING; n.sid:0, n.state:LOOKING, n.leader:0, n.round:0x2, n.peerEpoch:0x1, n.zxid:0x10000015f, message format version:0x2, n.config version:0x0 (FastLeaderElection$Messenger$WorkerReceiver:389) 
09-27 00:45:11,719 INFO (LearnerHandler-/90.33.116.214:59826) Accepted TLS connection from /90.33.116.214:59826 - TLSv1.3 - TLS_AES_128_GCM_SHA256 (UnifiedServerSocket$UnifiedSocket:266) 
09-27 00:45:11,719 INFO (LearnerHandler-/90.33.116.214:59826) Follower sid: 0 : info : 90.33.116.214:2878:3878:participant (LearnerHandler:504) 
09-27 00:45:11,719 INFO (LearnerHandler-/90.33.116.214:59826) On disk txn sync enabled with snapshotSizeFactor 0.33 (ZKDatabase:345) 
09-27 00:45:11,719 INFO (LearnerHandler-/90.33.116.214:59826) Synchronizing with Learner sid: 0 maxCommittedLog=0x100000163 minCommittedLog=0x100000001 lastProcessedZxid=0x100000163 peerLastZxid=0x10000015f (LearnerHandler:807) 
09-27 00:45:11,719 INFO (LearnerHandler-/90.33.116.214:59826) Using committedLog for peer sid: 0 (LearnerHandler:871) 
09-27 00:45:11,719 INFO (LearnerHandler-/90.33.116.214:59826) Sending DIFF zxid=0x100000163  for peer sid: 0 (LearnerHandler:979) 
09-27 00:45:11,721 ERROR(LearnerHandler-/90.33.116.214:59826) Unexpected exception causing shutdown while sock still open (LearnerHandler:714) 
java.io.EOFException: null
    at java.io.DataInputStream.readInt(DataInputStream.java:393) ~[?:1.8.0_342]
    at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:96) ~[zookeeper-jute-3.6.3.jar:3.6.3]
    at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:86) ~[zookeeper-jute-3.6.3.jar:3.6.3]
    at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:134) ~[zookeeper-jute-3.6.3.jar:3.6.3]
    at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:611) [zookeeper-3.6.3-h0.gdd.pub.r286.jar:3.6.3]
09-27 00:45:11,721 WARN (LearnerHandler-/90.33.116.214:59826) ******* GOODBYE /90.33.116.214:59826 ******** (LearnerHandler:737) 

 

 


> zookeeper oom
> -------------
>
>                 Key: ZOOKEEPER-4623
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4623
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.6.3
>            Reporter: hejie
>            Priority: Major
>
> ZooKeeper is deployed on three different nodes to form a cluster. The disk capacity of two nodes is fully occupied and files cannot be written. The memory of the third node overflows.
>  
> it seem to SSLSessionImpl has a large number of objects, causing memory overflow.
>  
>  
> Large Object:
> sun.security.ssl.SSLSocketImpl @ 0xe541fce8
>   sun.security.ssl.TransportContext @ 0xe541ff78
>     sun.security.ssl.SSLSessionImpl @ 0xe5427d20
>       java.util.concurrent.ConcurrentLinkedQueue @ 0xe5427ec0
>         java.util.concurrent.ConcurrentLinkedQueue$Node @ 0xe5427ed8
>         java.util.concurrent.ConcurrentLinkedQueue$Node @ 0xe5427ef0
>         sun.security.ssl.SSLSessionImpl @ 0xe5427f08
>         java.util.concurrent.ConcurrentLinkedQueue @ 0xe54280a8
>           java.util.concurrent.ConcurrentLinkedQueue$Node @ 0xe54280c0
>             java.util.concurrent.ConcurrentLinkedQueue$Node @ 0xe54280d8
>               sun.security.ssl.SSLSessionImpl @ 0xe54280f0
>                  java.util.concurrent.ConcurrentLinkedQueue @ 0xe54281f8
>                     java.util.concurrent.ConcurrentLinkedQueue$Node @ 0xe5428210
>  
>  
> Thread:
>  
> org.apache.zookeeper.server.quorum.LearnerHandler#run
> org.apache.jute.BinaryInputArchive#readRecord
> xxx
> org.apache.zookeeper.server.quorum.UnifiedServerSocket.UnifiedSocket#detectMode
> xxxx
> LOG.info("Accepted TLS connection from {} - {} - {}",sslSocket.getRemoteSocketAddress(),sslSocket.getSession().getProtocol(),sslSocket.getSession().getCipherSuite());
> xxxx
> sun.security.ssl.SSLSessionImpl#addChild
>  
>  
>  
>  
>  
> log:
>  
> 09-27 00:45:11,708 INFO (WorkerReceiver[myid=2]) Notification: my state:LEADING; n.sid:0, n.state:LOOKING, n.leader:0, n.round:0x2, n.peerEpoch:0x1, n.zxid:0x10000015f, message format version:0x2, n.config version:0x0 (FastLeaderElection$Messenger$WorkerReceiver:389) 
> 09-27 00:45:11,719 INFO (LearnerHandler-/90.33.116.214:59826) Accepted TLS connection from /90.33.116.214:59826 - TLSv1.3 - TLS_AES_128_GCM_SHA256 (UnifiedServerSocket$UnifiedSocket:266) 
> 09-27 00:45:11,719 INFO (LearnerHandler-/90.33.116.214:59826) Follower sid: 0 : info : 90.33.116.214:2878:3878:participant (LearnerHandler:504) 
> 09-27 00:45:11,719 INFO (LearnerHandler-/90.33.116.214:59826) On disk txn sync enabled with snapshotSizeFactor 0.33 (ZKDatabase:345) 
> 09-27 00:45:11,719 INFO (LearnerHandler-/90.33.116.214:59826) Synchronizing with Learner sid: 0 maxCommittedLog=0x100000163 minCommittedLog=0x100000001 lastProcessedZxid=0x100000163 peerLastZxid=0x10000015f (LearnerHandler:807) 
> 09-27 00:45:11,719 INFO (LearnerHandler-/90.33.116.214:59826) Using committedLog for peer sid: 0 (LearnerHandler:871) 
> 09-27 00:45:11,719 INFO (LearnerHandler-/90.33.116.214:59826) Sending DIFF zxid=0x100000163  for peer sid: 0 (LearnerHandler:979) 
> 09-27 00:45:11,721 ERROR(LearnerHandler-/90.33.116.214:59826) Unexpected exception causing shutdown while sock still open (LearnerHandler:714) 
> java.io.EOFException: null
>     at java.io.DataInputStream.readInt(DataInputStream.java:393) ~[?:1.8.0_342]
>     at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:96) ~[zookeeper-jute-3.6.3.jar:3.6.3]
>     at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:86) ~[zookeeper-jute-3.6.3.jar:3.6.3]
>     at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:134) ~[zookeeper-jute-3.6.3.jar:3.6.3]
>     at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:611) [zookeeper-3.6.3-h0.gdd.pub.r286.jar:3.6.3]
> 09-27 00:45:11,721 WARN (LearnerHandler-/90.33.116.214:59826) ******* GOODBYE /90.33.116.214:59826 ******** (LearnerHandler:737) 
>  
>  
> how can i fix the problem



--
This message was sent by Atlassian Jira
(v8.20.10#820010)