You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Sergey Maslyakov (JIRA)" <ji...@apache.org> on 2013/04/27 09:46:15 UTC

[jira] [Commented] (ZOOKEEPER-1043) Looped NPE at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:244)

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643588#comment-13643588 ] 

Sergey Maslyakov commented on ZOOKEEPER-1043:
---------------------------------------------

I observe very similar symptoms on a *3.4.5* Zookeeper that runs on Solaris10/x86 and JRE 1.6.0_27-b07. Please see the log below. The explanation that I found is same as what "J Ch" is talking about. First, Socket.setTcpNodelay() throws an exception that is not properly handled, which results in not attaching a socket to a selector, and then a flood of log messages caused by an NPE.

I would say it is a critical problem as I have not seen Zookeeper to leave this state on its own.

{code:none|title=Sample log}
2013-04-24 03:07:04,480 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.64.133.196:54055
2013-04-24 03:07:04,481 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12181:NIOServerCnxnFactory@220] - Ignoring exception
java.net.SocketException: Invalid argument
        at sun.nio.ch.Net.setIntOption0(Native Method)
        at sun.nio.ch.Net.setIntOption(Net.java:157)
        at sun.nio.ch.SocketChannelImpl$1.setInt(SocketChannelImpl.java:406)
        at sun.nio.ch.SocketOptsImpl.setBoolean(SocketOptsImpl.java:38)
        at sun.nio.ch.SocketOptsImpl$IP$TCP.noDelay(SocketOptsImpl.java:284)
        at sun.nio.ch.OptionAdaptor.setTcpNoDelay(OptionAdaptor.java:48)
        at sun.nio.ch.SocketAdaptor.setTcpNoDelay(SocketAdaptor.java:268)
        at org.apache.zookeeper.server.NIOServerCnxn.<init>(NIOServerCnxn.java:107)
        at org.apache.zookeeper.server.NIOServerCnxnFactory.createConnection(NIOServerCnxnFactory.java:161)
        at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:202)
        at java.lang.Thread.run(Thread.java:619)
2013-04-24 03:07:04,482 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12181:NIOServerCnxnFactory@218] - Ignoring unexpected runtime exception
java.lang.NullPointerException
        at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:190)
        at java.lang.Thread.run(Thread.java:619) 
{code}

The fix seems to be pretty simple, however, it is very difficult to reproduce the problem for testing purposes.
                
> Looped NPE at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:244)
> -------------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1043
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1043
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.3.3
>         Environment: Sparc Solaris 10
> Java 6u17 64 bits
> 5 nodes ensemble
>            Reporter: César Álvarez Núñez
>         Attachments: ZOOKEEPER-1043.patch
>
>
> I'm sorry but I only have this log (which belongs to a "follower" node) and a previous message [Unexpected NodeCreated event after a reconnection.|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201103.mbox/%3CAANLkTi=vmZ5v4W6FMhWg4XO6rJT89eGozGUE840bku0_@mail.gmail.com%3E] where I describe a potential side-effect at client side.
> {noformat}
> 2011-04-04 09:31:09,608 - INFO  [Snapshot Thread:FileTxnSnapLog@208][] - Snapshotting: 1700527e36
> 2011-04-04 09:31:09,653 - INFO  [SyncThread:1:FileTxnLog@197][] - Creating new log file: log.1700527e38
> 2011-04-04 10:13:39,287 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@251][] - Accepted socket connection from /XXX.XXX.XXX.69:1093
> 2011-04-04 10:13:39,371 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn@777][] - Client attempting to establish new session at /XXX.XXX.XXX.69:1093
> 2011-04-04 10:13:39,376 - INFO  [CommitProcessor:1:NIOServerCnxn@1580][] - Established session 0x12ee79c4a720022 with negotiated timeout 20000 for client /XXX.XXX.XXX.69:1093
> 2011-04-04 12:04:11,131 - INFO  [SyncThread:1:FileTxnLog@197][] - Creating new log file: log.170053bf15
> 2011-04-04 12:04:11,131 - INFO  [Snapshot Thread:FileTxnSnapLog@208][] - Snapshotting: 170053bf17
> 2011-04-04 12:13:10,779 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@251][] - Accepted socket connection from /XXX.XXX.XXX.63:1817
> 2011-04-04 12:13:10,790 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn@777][] - Client attempting to establish new session at /XXX.XXX.XXX.63:1817
> 2011-04-04 12:13:10,794 - INFO  [CommitProcessor:1:NIOServerCnxn@1580][] - Established session 0x12ee79c4a720023 with negotiated timeout 20000 for client /XXX.XXX.XXX.63:1817
> 2011-04-04 12:13:10,814 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn@634][] - EndOfStreamException: Unable to read additional data from client sessionid 0x12ee79c4a720023, likely client has closed socket
> 2011-04-04 12:13:10,816 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn@1435][] - Closed socket connection for client /XXX.XXX.XXX.63:1817 which had sessionid 0x12ee79c4a720023
> 2011-04-04 12:13:10,839 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@251][] - Accepted socket connection from /XXX.XXX.XXX.63:1814
> 2011-04-04 12:13:10,840 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@274][] - Ignoring exception
> java.net.SocketException: Invalid argument
>         at sun.nio.ch.Net.setIntOption0(Native Method)
>         at sun.nio.ch.Net.setIntOption(Unknown Source)
>         at sun.nio.ch.SocketChannelImpl$1.setInt(Unknown Source)
>         at sun.nio.ch.SocketOptsImpl.setBoolean(Unknown Source)
>         at sun.nio.ch.SocketOptsImpl$IP$TCP.noDelay(Unknown Source)
>         at sun.nio.ch.OptionAdaptor.setTcpNoDelay(Unknown Source)
>         at sun.nio.ch.SocketAdaptor.setTcpNoDelay(Unknown Source)
>         at org.apache.zookeeper.server.NIOServerCnxn.<init>(NIOServerCnxn.java:1367)
>         at org.apache.zookeeper.server.NIOServerCnxn$Factory.createConnection(NIOServerCnxn.java:215)
>         at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:256)
> 2011-04-04 12:13:10,841 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@272][] - Ignoring unexpected runtime exception
> java.lang.NullPointerException
>         at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:244)
> 2011-04-04 12:13:10,841 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@272][] - Ignoring unexpected runtime exception
> java.lang.NullPointerException
>         at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:244)
> 2011-04-04 12:13:10,842 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@272][] - Ignoring unexpected runtime exception
> java.lang.NullPointerException
>         at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:244)
> ...
> ...
> ...
> 2011-04-04 16:49:23,101 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@272][] - Ignoring unexpected runtime exception
> java.lang.NullPointerException
>         at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:244)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira