You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by jeevi tesh <je...@gmail.com> on 2015/06/01 21:15:21 UTC

zookeeper closing socket connection exception

Hi,
I'm running into this issue several times but still not able resolve kindly
help me in this regard.
I have written a crawler which will be keep running for several days after
4 days of continuous interaction of data base with my application system.
Data base fails to responsed. I'm not able to figure where things all of a
sudden can go wrong after 4 days of proper running.
My configuration i have used hbase 0.96.2 single server.
jdk 1.7

issue is this following error
WARN  [http-bio-8080-exec-4-SendThread(hadoop2:2181)] zookeeper.ClientCnxn
(ClientCnxn.java:run(1089)) - Session 0x14da00e69e001ad for server null,
unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
If this exception happens only solution i have is restart hbase that is not
a viable solution because that will corrupt my system data.

Re: zookeeper closing socket connection exception

Posted by Ted Yu <yu...@gmail.com>.

How much heap did you give the region server ?

How much total memory does the box have ?

I guess you have read http://hbase.apache.org/book.html#jvm

If you're using jdk 1.7.0_60 or newer, you can consider using G1GC.

Cheers

On Tue, Jun 2, 2015 at 3:26 AM, jeevi tesh <je...@gmail.com> wrote:

> First of all thanks a lot for coming forward with helping hand.
>
> Here my answers along with the question you asked
>
>
>
> How many zookeeper servers do you have ? Or what is the number of clients
> you have running per host
>
> Ans: I have only one linux box which is only one node system.
>
> Basically in a single system I have installed Hbase.
>
>
>
> what is the configured value of maxClientCnxns in the ZooKeeper servers?
>
> Ans: We are using the default configuration. We have not introduced any new
> value in hbase-site.xml
>
>
>
> Is the issue impacting clients only or is it also impacting the
> RegionServers
>
> Ans: In this case all regional server, master node, client is same. Because
> we have installed hbase in a single system
>
>
> Have you looked into why the ZooKeeper server is no longer accepting
> connections
>
> Ans: Now I checked logs of hbase just at the moment my application broke
> for me it l*ooked like JVM went for Garbage collection after that it newer
> came back.* *Which resulted in exception.Is my interpretation correct.
> kindly let me know *
>
> Here is the complete log
>
> 2015-06-01 19:59:53,808 INFO  [pool-55-thread-1] master.HMaster: Master has
> completed initialization
>
> 2015-06-01 19:59:53,808 INFO  [main-EventThread] zookeeper.ClientCnxn:
> EventThread shut down
>
> 2015-06-01 20:00:46,431 INFO  [JvmPauseMonitor] util.JvmPauseMonitor:
> Detected pause in JVM or host machine (eg GC): pause of approximately
> 6885ms
>
> GC pool 'ParNew' had collection(s): count=1 time=7383ms
>
> 2015-06-01 20:00:46,431 INFO  [JvmPauseMonitor] util.JvmPauseMonitor:
> Detected pause in JVM or host machine (eg GC): pause of approximately
> 6886ms
>
> GC pool 'ParNew' had collection(s): count=1 time=7383ms
>
> 2015-06-01 20:00:47,032 WARN  [M:0;hadoop2:35923.oldLogCleaner]
> cleaner.CleanerChore: A file cleanerM:0;hadoop2:35923.oldLogCleaner is
> stopped, won't delete any more files
> in:file:/home/hadoop/hbaseDataDir/oldWALs
>
> 2015-06-01 20:02:05,148 WARN  [M:0;hadoop2:35923.oldLogCleaner]
> util.Sleeper: We slept 78116ms instead of 60000ms, this is likely due to a
> long garbage collecting pause and it's usually bad, see
> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
>
> 2015-06-01 20:02:05,148 WARN  [M:0;hadoop2:35923.archivedHFileCleaner]
> util.Sleeper: We slept 78122ms instead of 60000ms, this is likely due to a
> long garbage collecting pause and it's usually bad, see
> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
>
> 2015-06-01 20:02:05,149 WARN
> [hadoop2,35923,1432909409923-ClusterStatusChore] util.Sleeper: We slept
> 78128ms instead of 60000ms, this is likely due to a long garbage collecting
> pause and it's usually bad, see
> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
>
> 2015-06-01 20:02:05,149 WARN  [RS:0;hadoop2:40129] util.Sleeper: We slept
> 39687ms instead of 3000ms, this is likely due to a long garbage collecting
> pause and it's usually bad, see
> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
>
> 2015-06-01 20:02:05,151 WARN  [JvmPauseMonitor] util.JvmPauseMonitor:
> Detected pause in JVM or host machine (eg GC): pause of approximately
> 39206ms
>
> GC pool 'ParNew' had collection(s): count=1 time=39328ms
>
> 2015-06-01 20:02:05,151 WARN  [M:0;hadoop2:35923] util.Sleeper: We slept
> 39345ms instead of 100ms, this is likely due to a long garbage collecting
> pause and it's usually bad, see
> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
>
> 2015-06-01 20:02:05,151 WARN  [JvmPauseMonitor] util.JvmPauseMonitor:
> Detected pause in JVM or host machine (eg GC): pause of approximately
> 39205ms
>
> GC pool 'ParNew' had collection(s): count=1 time=39328ms
>
> 2015-06-01 20:02:05,151 INFO  [SessionTracker] server.ZooKeeperServer:
> Expiring session 0x14da00e69e00001, timeout of 40000ms exceeded
>
> 2015-06-01 20:02:05,151 INFO  [RS:0;hadoop2:40129-SendThread(hadoop2:2181)]
> zookeeper.ClientCnxn: Client session timed out, have not heard from server
> in 52055ms for sessionid 0x14da00e69e00001, closing socket connection and
> attempting reconnect
>
> 2015-06-01 20:02:05,151 INFO  [RS:0;hadoop2:40129-SendThread(hadoop2:2181)]
> zookeeper.ClientCnxn: Client session timed out, have not heard from server
> in 52053ms for sessionid 0x14da00e69e00004, closing socket connection and
> attempting reconnect
>
> 2015-06-01 20:02:05,151 WARN
> [hadoop2,35923,1432909409923.splitLogManagerTimeoutMonitor] util.Sleeper:
> We slept 39713ms instead of 1000ms, this is likely due to a long garbage
> collecting pause and it's usually bad, see
> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
>
> 2015-06-01 20:02:05,155 WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181]
> server.NIOServerCnxn: caught end of stream exception
>
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x14da00e69e00001, likely client has closed socket
>
>           at
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
>
>           at
>
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
>
>           at java.lang.Thread.run(Thread.java:745)
>
>
>
>
>
> On Tue, Jun 2, 2015 at 12:45 AM, jeevi tesh <je...@gmail.com>
> wrote:
>
> > Hi,
> > I'm running into this issue several times but still not able resolve
> > kindly help me in this regard.
> > I have written a crawler which will be keep running for several days
> after
> > 4 days of continuous interaction of data base with my application system.
> > Data base fails to responsed. I'm not able to figure where things all of
> a
> > sudden can go wrong after 4 days of proper running.
> > My configuration i have used hbase 0.96.2 single server.
> > jdk 1.7
> >
> > issue is this following error
> > WARN  [http-bio-8080-exec-4-SendThread(hadoop2:2181)]
> zookeeper.ClientCnxn
> > (ClientCnxn.java:run(1089)) - Session 0x14da00e69e001ad for server null,
> > unexpected error, closing socket connection and attempting reconnect
> > java.net.ConnectException: Connection refused
> > If this exception happens only solution i have is restart hbase that is
> > not a viable solution because that will corrupt my system data.
> >
>

Re: zookeeper closing socket connection exception

Posted by jeevi tesh <je...@gmail.com>.

First of all thanks a lot for coming forward with helping hand.

Here my answers along with the question you asked



How many zookeeper servers do you have ? Or what is the number of clients
you have running per host

Ans: I have only one linux box which is only one node system.

Basically in a single system I have installed Hbase.



what is the configured value of maxClientCnxns in the ZooKeeper servers?

Ans: We are using the default configuration. We have not introduced any new
value in hbase-site.xml



Is the issue impacting clients only or is it also impacting the
RegionServers

Ans: In this case all regional server, master node, client is same. Because
we have installed hbase in a single system


Have you looked into why the ZooKeeper server is no longer accepting
connections

Ans: Now I checked logs of hbase just at the moment my application broke
for me it l*ooked like JVM went for Garbage collection after that it newer
came back.* *Which resulted in exception.Is my interpretation correct.
kindly let me know *

Here is the complete log

2015-06-01 19:59:53,808 INFO  [pool-55-thread-1] master.HMaster: Master has
completed initialization

2015-06-01 19:59:53,808 INFO  [main-EventThread] zookeeper.ClientCnxn:
EventThread shut down

2015-06-01 20:00:46,431 INFO  [JvmPauseMonitor] util.JvmPauseMonitor:
Detected pause in JVM or host machine (eg GC): pause of approximately 6885ms

GC pool 'ParNew' had collection(s): count=1 time=7383ms

2015-06-01 20:00:46,431 INFO  [JvmPauseMonitor] util.JvmPauseMonitor:
Detected pause in JVM or host machine (eg GC): pause of approximately 6886ms

GC pool 'ParNew' had collection(s): count=1 time=7383ms

2015-06-01 20:00:47,032 WARN  [M:0;hadoop2:35923.oldLogCleaner]
cleaner.CleanerChore: A file cleanerM:0;hadoop2:35923.oldLogCleaner is
stopped, won't delete any more files
in:file:/home/hadoop/hbaseDataDir/oldWALs

2015-06-01 20:02:05,148 WARN  [M:0;hadoop2:35923.oldLogCleaner]
util.Sleeper: We slept 78116ms instead of 60000ms, this is likely due to a
long garbage collecting pause and it's usually bad, see
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired

2015-06-01 20:02:05,148 WARN  [M:0;hadoop2:35923.archivedHFileCleaner]
util.Sleeper: We slept 78122ms instead of 60000ms, this is likely due to a
long garbage collecting pause and it's usually bad, see
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired

2015-06-01 20:02:05,149 WARN
[hadoop2,35923,1432909409923-ClusterStatusChore] util.Sleeper: We slept
78128ms instead of 60000ms, this is likely due to a long garbage collecting
pause and it's usually bad, see
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired

2015-06-01 20:02:05,149 WARN  [RS:0;hadoop2:40129] util.Sleeper: We slept
39687ms instead of 3000ms, this is likely due to a long garbage collecting
pause and it's usually bad, see
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired

2015-06-01 20:02:05,151 WARN  [JvmPauseMonitor] util.JvmPauseMonitor:
Detected pause in JVM or host machine (eg GC): pause of approximately
39206ms

GC pool 'ParNew' had collection(s): count=1 time=39328ms

2015-06-01 20:02:05,151 WARN  [M:0;hadoop2:35923] util.Sleeper: We slept
39345ms instead of 100ms, this is likely due to a long garbage collecting
pause and it's usually bad, see
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired

2015-06-01 20:02:05,151 WARN  [JvmPauseMonitor] util.JvmPauseMonitor:
Detected pause in JVM or host machine (eg GC): pause of approximately
39205ms

GC pool 'ParNew' had collection(s): count=1 time=39328ms

2015-06-01 20:02:05,151 INFO  [SessionTracker] server.ZooKeeperServer:
Expiring session 0x14da00e69e00001, timeout of 40000ms exceeded

2015-06-01 20:02:05,151 INFO  [RS:0;hadoop2:40129-SendThread(hadoop2:2181)]
zookeeper.ClientCnxn: Client session timed out, have not heard from server
in 52055ms for sessionid 0x14da00e69e00001, closing socket connection and
attempting reconnect

2015-06-01 20:02:05,151 INFO  [RS:0;hadoop2:40129-SendThread(hadoop2:2181)]
zookeeper.ClientCnxn: Client session timed out, have not heard from server
in 52053ms for sessionid 0x14da00e69e00004, closing socket connection and
attempting reconnect

2015-06-01 20:02:05,151 WARN
[hadoop2,35923,1432909409923.splitLogManagerTimeoutMonitor] util.Sleeper:
We slept 39713ms instead of 1000ms, this is likely due to a long garbage
collecting pause and it's usually bad, see
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired

2015-06-01 20:02:05,155 WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181]
server.NIOServerCnxn: caught end of stream exception

EndOfStreamException: Unable to read additional data from client sessionid
0x14da00e69e00001, likely client has closed socket

          at
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)

          at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)

          at java.lang.Thread.run(Thread.java:745)





On Tue, Jun 2, 2015 at 12:45 AM, jeevi tesh <je...@gmail.com> wrote:

> Hi,
> I'm running into this issue several times but still not able resolve
> kindly help me in this regard.
> I have written a crawler which will be keep running for several days after
> 4 days of continuous interaction of data base with my application system.
> Data base fails to responsed. I'm not able to figure where things all of a
> sudden can go wrong after 4 days of proper running.
> My configuration i have used hbase 0.96.2 single server.
> jdk 1.7
>
> issue is this following error
> WARN  [http-bio-8080-exec-4-SendThread(hadoop2:2181)] zookeeper.ClientCnxn
> (ClientCnxn.java:run(1089)) - Session 0x14da00e69e001ad for server null,
> unexpected error, closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
> If this exception happens only solution i have is restart hbase that is
> not a viable solution because that will corrupt my system data.
>

Re: zookeeper closing socket connection exception

Posted by Ted Yu <yu...@gmail.com>.

How many zookeeper servers do you have ?

Cheers

On Mon, Jun 1, 2015 at 12:15 PM, jeevi tesh <je...@gmail.com> wrote:

> Hi,
> I'm running into this issue several times but still not able resolve kindly
> help me in this regard.
> I have written a crawler which will be keep running for several days after
> 4 days of continuous interaction of data base with my application system.
> Data base fails to responsed. I'm not able to figure where things all of a
> sudden can go wrong after 4 days of proper running.
> My configuration i have used hbase 0.96.2 single server.
> jdk 1.7
>
> issue is this following error
> WARN  [http-bio-8080-exec-4-SendThread(hadoop2:2181)] zookeeper.ClientCnxn
> (ClientCnxn.java:run(1089)) - Session 0x14da00e69e001ad for server null,
> unexpected error, closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
> If this exception happens only solution i have is restart hbase that is not
> a viable solution because that will corrupt my system data.
>

Re: zookeeper closing socket connection exception

Posted by Esteban Gutierrez <es...@cloudera.com>.

Hi Jeevi,

Have you looked into why the ZooKeeper server is no longer accepting
connections? what is the number of clients you have running per host and
what is the configured value of maxClientCnxns in the ZooKeeper servers?
Also is the issue impacting clients only or is it also impacting the
RegionServers?

cheers,
esteban.




--
Cloudera, Inc.


On Mon, Jun 1, 2015 at 12:15 PM, jeevi tesh <je...@gmail.com> wrote:

> Hi,
> I'm running into this issue several times but still not able resolve kindly
> help me in this regard.
> I have written a crawler which will be keep running for several days after
> 4 days of continuous interaction of data base with my application system.
> Data base fails to responsed. I'm not able to figure where things all of a
> sudden can go wrong after 4 days of proper running.
> My configuration i have used hbase 0.96.2 single server.
> jdk 1.7
>
> issue is this following error
> WARN  [http-bio-8080-exec-4-SendThread(hadoop2:2181)] zookeeper.ClientCnxn
> (ClientCnxn.java:run(1089)) - Session 0x14da00e69e001ad for server null,
> unexpected error, closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
> If this exception happens only solution i have is restart hbase that is not
> a viable solution because that will corrupt my system data.
>