You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Wayne <wa...@gmail.com> on 2011/01/27 15:48:51 UTC

SocketTimeoutException caused by GC?

We have got .90 up and running well, but again after 24 hours of loading a
node went down. Under it all I assume it is a GC issue, but the GC logging
rolls every < 60 minutes so I can never see logs from 5 hours ago (working
on getting Scribe up to solve that). Most of our issues are a node being
marked as dead after being un-responsive. It often starts with a socket
timeout. We can turn up the timeout for zookeeper but that is not dealing
with the issue.

Here is the first sign of trouble. Is the 1 min 34 second gap below most
likely a stop the world GC?

2011-01-27 07:00:43,716 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
Roll /hbase/.logs/x.x.x.6,60020,1295969329357/x.x.x.6%3A60020.1296111623011,
entries=242, filesize=69508440. New hlog
/hbase/.logs/x.x.x.6,60020,1295969329357/x.x.x.6%3A60020.1296111643436
2011-01-27 07:02:17,663 WARN org.apache.hadoop.hdfs.DFSClient:
DFSOutputStream ResponseProcessor exception for block
blk_-5705652521953118952_104835java.net.SocketTimeoutException: 69000 millis
timeout while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/x.x.x.6:48141
remote=/x.x.x.6:50010]

It is followed by zookeeper complaining due to lack of a response.

2011-01-27 07:02:17,665 INFO org.apache.zookeeper.ClientCnxn: Client session
timed out, have not heard from server in 94590ms for sessionid
0x2dbdc88700000e, closing socket connection and attempting reconnect

There is also a message about the data node.

2011-01-27 07:02:17,665 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block blk_4267667433820446273_104837 bad datanode[0]
x.x.x.6:50010

And eventually the node is brought down.

2011-01-27 07:02:17,783 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
server...

The data node also shows some errors.

2011-01-27 07:02:17,667 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(x.x.x.6:50010,
storageID=DS-1438948528-x.x.x.6-50010-1295969305669, infoPort=50075,
ipcPort=50020):DataXceiver java.net.SocketException: Connection reset


Any help, advice, ideas, or guesses would be greatly appreciated. Can anyone
sustain 30-40k writes/node/sec for days/weeks on end without using the bulk
loader? Am I rolling a rock uphill against the reality of the JVM?

Thanks.

Re: SocketTimeoutException caused by GC?

Posted by Ted Dunning <td...@maprtech.com>.
To amplify on this, the collections before tenuring are copying collections
and thus immune to fragmentation. CMS, however, stands for Concurrent Mark
Sweep and mark-sweep collectors are subject to fragmentation.  When
fragmentation causes a failure to allocate, then you get a full copying GC.

If you can possibly avoid non-permanent allocations from getting to the CMS,
then you can completely prevent fragmentation.  This would be a huge win.

If you can radically decrease number of objects being tenured as a fraction
of the total tenured heap, you won't prevent CMS from having anything to do,
but there is likely to be a threshold effect where fragmentation becomes a
non-issue because freed blocks get glued back together quickly enough to
keep the fragmentation at a workable level.

You can tell how well you are doing at this by printing the tenuring
distribution.  It is quite informative.

On Thu, Jan 27, 2011 at 4:44 PM, Stack <st...@duboce.net> wrote:

> > export HBASE_HEAPSIZE=8192
> > export HBASE_OPTS="-XX:+UseCMSInitiatingOccupancyOnly
> > -XX:CMSInitiatingOccupancyFraction=60 -XX:+CMSParallelRemarkEnabled
> > -XX:SurvivorRatio=8 -XX:NewRatio=3 -XX:MaxTenuringThreshold=1
> > -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC
> > -XX:+CMSIncrementalMode"
> > export HBASE_OPTS="$HBASE_OPTS -verbose:gc -XX:+PrintGCDetails
> > -XX:+PrintGCDateStamps -Xloggc:$HBASE_HOME/logs/gc-hbase.log"
> >
>
> You don't need this on your class of hardware: CMSIncrementalMode
>
> But the one that looks off is XX:MaxTenuringThreshold.  It struck me
> as odd (and Ted Dunning as frightening) as it means all gets promoted
> to old gen after one young gen GC.  Let the young gen churn some more
> before items earn a promotion.  Make it 5 or 7 or something.  See if
> you go longer between full GCs?  Initiating fraction of 60 is low.
> Its probably costing you a bunch of CPU.  Try the
> XX:MaxTenuringThreshold first.
>

Re: SocketTimeoutException caused by GC?

Posted by Stack <st...@duboce.net>.
On Thu, Jan 27, 2011 at 11:38 AM, Wayne <wa...@gmail.com> wrote:
> Should we schedule a rolling restart every 24 hours?

You could do this (See J-Ds note).

You could try a different JVM (Todd tripped over this beauty recently
that would seem to have fixes added in u20 to address fragmentation
made things worse:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6999988 --- looking
at u23 release notes, this issue is not yet fixed).  Not u18 -- thats
broke -- but maybe a u16?


> export HBASE_HEAPSIZE=8192
> export HBASE_OPTS="-XX:+UseCMSInitiatingOccupancyOnly
> -XX:CMSInitiatingOccupancyFraction=60 -XX:+CMSParallelRemarkEnabled
> -XX:SurvivorRatio=8 -XX:NewRatio=3 -XX:MaxTenuringThreshold=1
> -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC
> -XX:+CMSIncrementalMode"
> export HBASE_OPTS="$HBASE_OPTS -verbose:gc -XX:+PrintGCDetails
> -XX:+PrintGCDateStamps -Xloggc:$HBASE_HOME/logs/gc-hbase.log"
>

You don't need this on your class of hardware: CMSIncrementalMode

But the one that looks off is XX:MaxTenuringThreshold.  It struck me
as odd (and Ted Dunning as frightening) as it means all gets promoted
to old gen after one young gen GC.  Let the young gen churn some more
before items earn a promotion.  Make it 5 or 7 or something.  See if
you go longer between full GCs?  Initiating fraction of 60 is low.
Its probably costing you a bunch of CPU.  Try the
XX:MaxTenuringThreshold first.

St.Ack

>
> Thanks for your help!
>
>
> On Thu, Jan 27, 2011 at 1:53 PM, Stack <st...@duboce.net> wrote:
>
>> On Thu, Jan 27, 2011 at 6:48 AM, Wayne <wa...@gmail.com> wrote:
>> > We have got .90 up and running well, but again after 24 hours of loading
>> a
>> > node went down. Under it all I assume it is a GC issue, but the GC
>> logging
>> > rolls every < 60 minutes so I can never see logs from 5 hours ago
>> (working
>> > on getting Scribe up to solve that).
>>
>> Is it possible to ask GC logs roll?  Whats the config. please?
>>
>> > Most of our issues are a node being
>> > marked as dead after being un-responsive. It often starts with a socket
>> > timeout. We can turn up the timeout for zookeeper but that is not dealing
>> > with the issue.
>> >
>> > Here is the first sign of trouble. Is the 1 min 34 second gap below most
>> > likely a stop the world GC?
>> >
>> > 2011-01-27 07:00:43,716 INFO
>> org.apache.hadoop.hbase.regionserver.wal.HLog:
>> > Roll
>> /hbase/.logs/x.x.x.6,60020,1295969329357/x.x.x.6%3A60020.1296111623011,
>> > entries=242, filesize=69508440. New hlog
>> > /hbase/.logs/x.x.x.6,60020,1295969329357/x.x.x.6%3A60020.1296111643436
>> > 2011-01-27 07:02:17,663 WARN org.apache.hadoop.hdfs.DFSClient:
>> > DFSOutputStream ResponseProcessor exception for block
>> > blk_-5705652521953118952_104835java.net.SocketTimeoutException: 69000
>> millis
>> > timeout while waiting for channel to be ready for read. ch :
>> > java.nio.channels.SocketChannel[connected local=/x.x.x.6:48141
>> > remote=/x.x.x.6:50010]
>> >
>>
>> Possibly.  Whats the log history like before that?
>>
>> What was going on in the local DataNode at that time?  Seems like we
>> failed talking to it (I'm guessing x.x.x.6 the local datanode).  Could
>> be though because of GC pause.
>>
>>
>> > It is followed by zookeeper complaining due to lack of a response.
>> >
>> > 2011-01-27 07:02:17,665 INFO org.apache.zookeeper.ClientCnxn: Client
>> session
>> > timed out, have not heard from server in 94590ms for sessionid
>> > 0x2dbdc88700000e, closing socket connection and attempting reconnect
>> >
>>
>> This is strong indicator that there was a long GC pause around this time.
>>
>>
>> > There is also a message about the data node.
>> >
>> > 2011-01-27 07:02:17,665 WARN org.apache.hadoop.hdfs.DFSClient: Error
>> > Recovery for block blk_4267667433820446273_104837 bad datanode[0]
>> > x.x.x.6:50010
>> >
>>
>> This is probably just the DFSClient up in the RegionServer marking the
>> local datanode bad because of the above timeout talking to it
>> (possibly because of GC pause).  Its marking the first datanode in the
>> pipeline, probably the first, as bad.. Its probably going to try the
>> next node in the pipeline.
>>
>>
>> > And eventually the node is brought down.
>> >
>> > 2011-01-27 07:02:17,783 FATAL
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
>> > server...
>> >
>>
>> This happened because session expired on zk?  You should be able to
>> see up earlier in the log.
>>
>>
>> > The data node also shows some errors.
>> >
>> > 2011-01-27 07:02:17,667 ERROR
>> > org.apache.hadoop.hdfs.server.datanode.DataNode:
>> > DatanodeRegistration(x.x.x.6:50010,
>> > storageID=DS-1438948528-x.x.x.6-50010-1295969305669, infoPort=50075,
>> > ipcPort=50020):DataXceiver java.net.SocketException: Connection reset
>> >
>>
>> Oh, you did check the DN logs.
>>
>> So, does this look like DN logging the RS dropping the connection
>> because of timeout?  (Check the IPs -- see if it makes sense reading
>> it that way).
>>
>> >
>> > Any help, advice, ideas, or guesses would be greatly appreciated. Can
>> anyone
>> > sustain 30-40k writes/node/sec for days/weeks on end without using the
>> bulk
>> > loader? Am I rolling a rock uphill against the reality of the JVM?
>> >
>>
>> You are going to run into heap fragmentation that brings on the
>> stop-the-world GCs.  Some good progress has been made on this
>> high-priority issue recently, see
>> https://issues.apache.org/jira/browse/HBASE-3455, but a 'solution' is
>> still to do.  Even then, my guess is that we would not be able to
>> guarantee our never running into a stop-the-world GC.
>>
>> Are you all about writes?  If so, the bulk loader runs at one to two
>> orders of magnitude faster than going via API.  For this reason alone,
>> its worth exploring (You can do incremental bulk loads to a running
>> table).  If reads are rare, you might down the size of your heap so
>> the stop-the-worlds run quicker or you try giving HBase a really large
>> heap; perhaps the elbow room will make it so we avoid fragmentation
>> (but it might make it worse).   Reads and reading from cache do not
>> seem to bring on the stop-the-world GC the way writes do.  Are your
>> writes latency sensitive such that you can't tolerate the occasional
>> long pause?  If not, up your zk lease timeouts.
>>
>> St.Ack
>>
>

Re: SocketTimeoutException caused by GC?

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Doing it from the master is what we aim to do, but it's a lot more
involved. I know that jgray has a few ideas on the subject.

J-D

On Thu, Jan 27, 2011 at 4:25 PM, Ted Dunning <td...@maprtech.com> wrote:
> Why doesn't the master do this?  Why not just set it up so that you can tell
> the master that the maximum number of regions for the soon-to-go region
> server is 0?
>
> On Thu, Jan 27, 2011 at 3:53 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>
>> Writing this gives me an idea... I think one "easy" way we could
>> achieve this region draining problem is by writing a jruby script
>> that:
>>
>> 1- Retrieves the list of regions served by a RS
>> 2- Disables master balancing
>> 3- Moves one by one every region out of the RS, assigning them to the
>> other RSs in a round-robin fashion
>> 4- Shuts down the RS
>> 5- Reenables master balancing
>>
>> I wonder if it would work... At least it's a process that you could
>> stop at any time without breaking everything.
>>
>

Re: SocketTimeoutException caused by GC?

Posted by Ted Dunning <td...@maprtech.com>.
Why doesn't the master do this?  Why not just set it up so that you can tell
the master that the maximum number of regions for the soon-to-go region
server is 0?

On Thu, Jan 27, 2011 at 3:53 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> Writing this gives me an idea... I think one "easy" way we could
> achieve this region draining problem is by writing a jruby script
> that:
>
> 1- Retrieves the list of regions served by a RS
> 2- Disables master balancing
> 3- Moves one by one every region out of the RS, assigning them to the
> other RSs in a round-robin fashion
> 4- Shuts down the RS
> 5- Reenables master balancing
>
> I wonder if it would work... At least it's a process that you could
> stop at any time without breaking everything.
>

Re: SocketTimeoutException caused by GC?

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Not as far as I know, unless you disabled splits from the beginning
like some ppl do.

J-D

On Thu, Jan 27, 2011 at 4:22 PM, Ted Yu <yu...@gmail.com> wrote:
> Is there a way to disable splitting (on a particular region server) ?
>
> On Thu, Jan 27, 2011 at 4:20 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>
>> Mmm yes for the sake of not having a single region that moved, but it
>> wouldn't be so bad... it just means that those regions will be closed
>> when the RS closes.
>>
>> Also it's possible to have splits during that time, again it's not
>> dramatic as long as the script doesn't freak out because a region is
>> gone.
>>
>> J-D
>>
>> On Thu, Jan 27, 2011 at 4:13 PM, Ted Yu <yu...@gmail.com> wrote:
>> > Should steps 1 and 2 below be exchanged ?
>> >
>> > Regards
>> >
>> > On Thu, Jan 27, 2011 at 3:53 PM, Jean-Daniel Cryans <jdcryans@apache.org
>> >wrote:
>> >
>> >> To mitigate heap fragmentation, you could consider adding more nodes
>> >> to the cluster :)
>> >>
>> >> Regarding rolling restarts, currently there's one major issue:
>> >> https://issues.apache.org/jira/browse/HBASE-3441
>> >>
>> >> How it currently works is a bit dumb, when you cleanly close a region
>> >> server it will first close all incoming connections and then will
>> >> procede to close the regions and it's not until it's fully done that
>> >> it will report to the master. What it means for your clients is that a
>> >> portion of the regions will become unavailable for some time until the
>> >> region server is done shutting down. How long you ask? Well it depends
>> >> on 1) how many regions you have but also mostly 2) how much data needs
>> >> to be flushed from the MemStores. On one of our clusters, shutting
>> >> down HBase takes a few minutes since our write pattern is almost
>> >> perfectly distributed meaning that all the memstore space is always
>> >> full from all the regions (luckily it's a cluster that serves only
>> >> mapreduce jobs).
>> >>
>> >> Writing this gives me an idea... I think one "easy" way we could
>> >> achieve this region draining problem is by writing a jruby script
>> >> that:
>> >>
>> >> 1- Retrieves the list of regions served by a RS
>> >> 2- Disables master balancing
>> >> 3- Moves one by one every region out of the RS, assigning them to the
>> >> other RSs in a round-robin fashion
>> >> 4- Shuts down the RS
>> >> 5- Reenables master balancing
>> >>
>> >> I wonder if it would work... At least it's a process that you could
>> >> stop at any time without breaking everything.
>> >>
>> >> J-D
>> >>
>> >> On Thu, Jan 27, 2011 at 11:38 AM, Wayne <wa...@gmail.com> wrote:
>> >> > I assumed GC was *trying* to roll. It shows the last 30min of logs
>> with
>> >> > control characters at the end.
>> >> >
>> >> > We are not all writes. In terms of writes we can wait and the
>> zookeeper
>> >> > timeout can go way up, but we also need to support real-time reads
>> (end
>> >> user
>> >> > based) and that is why the zookeeper timeout is not our first choice
>> to
>> >> > increase (we would rather decrease it). The funny part is that .90
>> seems
>> >> > faster for us and churns through writes at a faster clip thereby
>> probably
>> >> > becoming less stable sooner due to the JVM not being able to handle
>> it.
>> >> > Should we schedule a rolling restart every 24 hours? How do production
>> >> > systems accept volume writes through the front door without melting
>> the
>> >> JVM
>> >> > due to fragmentation? We can possibly switch to bulk writes but
>> >> performance
>> >> > is not our problem...stability is. We are pushing 40k writes/node/sec
>> >> > sustained with well balanced regions hour after hour day after day
>> (until
>> >> a
>> >> > zookeeper tear down).
>> >> >
>> >> > Great to hear it is actively being looked at. I will keep an eye on
>> >> #3455.
>> >> >
>> >> > Below are our GC options, many of which are from work with the other
>> java
>> >> > database. Should I go back to the default settings? Should I use those
>> >> > referenced in the Jira #3455 (-XX:+UseConcMarkSweepGC
>> >> > -XX:CMSInitiatingOccupancyFraction=65 -Xms8g -Xmx8g). We are also
>> using
>> >> > Java6u23.
>> >> >
>> >> >
>> >> > export HBASE_HEAPSIZE=8192
>> >> > export HBASE_OPTS="-XX:+UseCMSInitiatingOccupancyOnly
>> >> > -XX:CMSInitiatingOccupancyFraction=60 -XX:+CMSParallelRemarkEnabled
>> >> > -XX:SurvivorRatio=8 -XX:NewRatio=3 -XX:MaxTenuringThreshold=1
>> >> > -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC
>> >> > -XX:+CMSIncrementalMode"
>> >> > export HBASE_OPTS="$HBASE_OPTS -verbose:gc -XX:+PrintGCDetails
>> >> > -XX:+PrintGCDateStamps -Xloggc:$HBASE_HOME/logs/gc-hbase.log"
>> >> >
>> >> >
>> >> > Thanks for your help!
>> >> >
>> >> >
>> >>
>> >
>>
>

Re: SocketTimeoutException caused by GC?

Posted by Ted Yu <yu...@gmail.com>.
Is there a way to disable splitting (on a particular region server) ?

On Thu, Jan 27, 2011 at 4:20 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> Mmm yes for the sake of not having a single region that moved, but it
> wouldn't be so bad... it just means that those regions will be closed
> when the RS closes.
>
> Also it's possible to have splits during that time, again it's not
> dramatic as long as the script doesn't freak out because a region is
> gone.
>
> J-D
>
> On Thu, Jan 27, 2011 at 4:13 PM, Ted Yu <yu...@gmail.com> wrote:
> > Should steps 1 and 2 below be exchanged ?
> >
> > Regards
> >
> > On Thu, Jan 27, 2011 at 3:53 PM, Jean-Daniel Cryans <jdcryans@apache.org
> >wrote:
> >
> >> To mitigate heap fragmentation, you could consider adding more nodes
> >> to the cluster :)
> >>
> >> Regarding rolling restarts, currently there's one major issue:
> >> https://issues.apache.org/jira/browse/HBASE-3441
> >>
> >> How it currently works is a bit dumb, when you cleanly close a region
> >> server it will first close all incoming connections and then will
> >> procede to close the regions and it's not until it's fully done that
> >> it will report to the master. What it means for your clients is that a
> >> portion of the regions will become unavailable for some time until the
> >> region server is done shutting down. How long you ask? Well it depends
> >> on 1) how many regions you have but also mostly 2) how much data needs
> >> to be flushed from the MemStores. On one of our clusters, shutting
> >> down HBase takes a few minutes since our write pattern is almost
> >> perfectly distributed meaning that all the memstore space is always
> >> full from all the regions (luckily it's a cluster that serves only
> >> mapreduce jobs).
> >>
> >> Writing this gives me an idea... I think one "easy" way we could
> >> achieve this region draining problem is by writing a jruby script
> >> that:
> >>
> >> 1- Retrieves the list of regions served by a RS
> >> 2- Disables master balancing
> >> 3- Moves one by one every region out of the RS, assigning them to the
> >> other RSs in a round-robin fashion
> >> 4- Shuts down the RS
> >> 5- Reenables master balancing
> >>
> >> I wonder if it would work... At least it's a process that you could
> >> stop at any time without breaking everything.
> >>
> >> J-D
> >>
> >> On Thu, Jan 27, 2011 at 11:38 AM, Wayne <wa...@gmail.com> wrote:
> >> > I assumed GC was *trying* to roll. It shows the last 30min of logs
> with
> >> > control characters at the end.
> >> >
> >> > We are not all writes. In terms of writes we can wait and the
> zookeeper
> >> > timeout can go way up, but we also need to support real-time reads
> (end
> >> user
> >> > based) and that is why the zookeeper timeout is not our first choice
> to
> >> > increase (we would rather decrease it). The funny part is that .90
> seems
> >> > faster for us and churns through writes at a faster clip thereby
> probably
> >> > becoming less stable sooner due to the JVM not being able to handle
> it.
> >> > Should we schedule a rolling restart every 24 hours? How do production
> >> > systems accept volume writes through the front door without melting
> the
> >> JVM
> >> > due to fragmentation? We can possibly switch to bulk writes but
> >> performance
> >> > is not our problem...stability is. We are pushing 40k writes/node/sec
> >> > sustained with well balanced regions hour after hour day after day
> (until
> >> a
> >> > zookeeper tear down).
> >> >
> >> > Great to hear it is actively being looked at. I will keep an eye on
> >> #3455.
> >> >
> >> > Below are our GC options, many of which are from work with the other
> java
> >> > database. Should I go back to the default settings? Should I use those
> >> > referenced in the Jira #3455 (-XX:+UseConcMarkSweepGC
> >> > -XX:CMSInitiatingOccupancyFraction=65 -Xms8g -Xmx8g). We are also
> using
> >> > Java6u23.
> >> >
> >> >
> >> > export HBASE_HEAPSIZE=8192
> >> > export HBASE_OPTS="-XX:+UseCMSInitiatingOccupancyOnly
> >> > -XX:CMSInitiatingOccupancyFraction=60 -XX:+CMSParallelRemarkEnabled
> >> > -XX:SurvivorRatio=8 -XX:NewRatio=3 -XX:MaxTenuringThreshold=1
> >> > -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC
> >> > -XX:+CMSIncrementalMode"
> >> > export HBASE_OPTS="$HBASE_OPTS -verbose:gc -XX:+PrintGCDetails
> >> > -XX:+PrintGCDateStamps -Xloggc:$HBASE_HOME/logs/gc-hbase.log"
> >> >
> >> >
> >> > Thanks for your help!
> >> >
> >> >
> >>
> >
>

Re: SocketTimeoutException caused by GC?

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Mmm yes for the sake of not having a single region that moved, but it
wouldn't be so bad... it just means that those regions will be closed
when the RS closes.

Also it's possible to have splits during that time, again it's not
dramatic as long as the script doesn't freak out because a region is
gone.

J-D

On Thu, Jan 27, 2011 at 4:13 PM, Ted Yu <yu...@gmail.com> wrote:
> Should steps 1 and 2 below be exchanged ?
>
> Regards
>
> On Thu, Jan 27, 2011 at 3:53 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>
>> To mitigate heap fragmentation, you could consider adding more nodes
>> to the cluster :)
>>
>> Regarding rolling restarts, currently there's one major issue:
>> https://issues.apache.org/jira/browse/HBASE-3441
>>
>> How it currently works is a bit dumb, when you cleanly close a region
>> server it will first close all incoming connections and then will
>> procede to close the regions and it's not until it's fully done that
>> it will report to the master. What it means for your clients is that a
>> portion of the regions will become unavailable for some time until the
>> region server is done shutting down. How long you ask? Well it depends
>> on 1) how many regions you have but also mostly 2) how much data needs
>> to be flushed from the MemStores. On one of our clusters, shutting
>> down HBase takes a few minutes since our write pattern is almost
>> perfectly distributed meaning that all the memstore space is always
>> full from all the regions (luckily it's a cluster that serves only
>> mapreduce jobs).
>>
>> Writing this gives me an idea... I think one "easy" way we could
>> achieve this region draining problem is by writing a jruby script
>> that:
>>
>> 1- Retrieves the list of regions served by a RS
>> 2- Disables master balancing
>> 3- Moves one by one every region out of the RS, assigning them to the
>> other RSs in a round-robin fashion
>> 4- Shuts down the RS
>> 5- Reenables master balancing
>>
>> I wonder if it would work... At least it's a process that you could
>> stop at any time without breaking everything.
>>
>> J-D
>>
>> On Thu, Jan 27, 2011 at 11:38 AM, Wayne <wa...@gmail.com> wrote:
>> > I assumed GC was *trying* to roll. It shows the last 30min of logs with
>> > control characters at the end.
>> >
>> > We are not all writes. In terms of writes we can wait and the zookeeper
>> > timeout can go way up, but we also need to support real-time reads (end
>> user
>> > based) and that is why the zookeeper timeout is not our first choice to
>> > increase (we would rather decrease it). The funny part is that .90 seems
>> > faster for us and churns through writes at a faster clip thereby probably
>> > becoming less stable sooner due to the JVM not being able to handle it.
>> > Should we schedule a rolling restart every 24 hours? How do production
>> > systems accept volume writes through the front door without melting the
>> JVM
>> > due to fragmentation? We can possibly switch to bulk writes but
>> performance
>> > is not our problem...stability is. We are pushing 40k writes/node/sec
>> > sustained with well balanced regions hour after hour day after day (until
>> a
>> > zookeeper tear down).
>> >
>> > Great to hear it is actively being looked at. I will keep an eye on
>> #3455.
>> >
>> > Below are our GC options, many of which are from work with the other java
>> > database. Should I go back to the default settings? Should I use those
>> > referenced in the Jira #3455 (-XX:+UseConcMarkSweepGC
>> > -XX:CMSInitiatingOccupancyFraction=65 -Xms8g -Xmx8g). We are also using
>> > Java6u23.
>> >
>> >
>> > export HBASE_HEAPSIZE=8192
>> > export HBASE_OPTS="-XX:+UseCMSInitiatingOccupancyOnly
>> > -XX:CMSInitiatingOccupancyFraction=60 -XX:+CMSParallelRemarkEnabled
>> > -XX:SurvivorRatio=8 -XX:NewRatio=3 -XX:MaxTenuringThreshold=1
>> > -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC
>> > -XX:+CMSIncrementalMode"
>> > export HBASE_OPTS="$HBASE_OPTS -verbose:gc -XX:+PrintGCDetails
>> > -XX:+PrintGCDateStamps -Xloggc:$HBASE_HOME/logs/gc-hbase.log"
>> >
>> >
>> > Thanks for your help!
>> >
>> >
>>
>

Re: SocketTimeoutException caused by GC?

Posted by Ted Yu <yu...@gmail.com>.
Should steps 1 and 2 below be exchanged ?

Regards

On Thu, Jan 27, 2011 at 3:53 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> To mitigate heap fragmentation, you could consider adding more nodes
> to the cluster :)
>
> Regarding rolling restarts, currently there's one major issue:
> https://issues.apache.org/jira/browse/HBASE-3441
>
> How it currently works is a bit dumb, when you cleanly close a region
> server it will first close all incoming connections and then will
> procede to close the regions and it's not until it's fully done that
> it will report to the master. What it means for your clients is that a
> portion of the regions will become unavailable for some time until the
> region server is done shutting down. How long you ask? Well it depends
> on 1) how many regions you have but also mostly 2) how much data needs
> to be flushed from the MemStores. On one of our clusters, shutting
> down HBase takes a few minutes since our write pattern is almost
> perfectly distributed meaning that all the memstore space is always
> full from all the regions (luckily it's a cluster that serves only
> mapreduce jobs).
>
> Writing this gives me an idea... I think one "easy" way we could
> achieve this region draining problem is by writing a jruby script
> that:
>
> 1- Retrieves the list of regions served by a RS
> 2- Disables master balancing
> 3- Moves one by one every region out of the RS, assigning them to the
> other RSs in a round-robin fashion
> 4- Shuts down the RS
> 5- Reenables master balancing
>
> I wonder if it would work... At least it's a process that you could
> stop at any time without breaking everything.
>
> J-D
>
> On Thu, Jan 27, 2011 at 11:38 AM, Wayne <wa...@gmail.com> wrote:
> > I assumed GC was *trying* to roll. It shows the last 30min of logs with
> > control characters at the end.
> >
> > We are not all writes. In terms of writes we can wait and the zookeeper
> > timeout can go way up, but we also need to support real-time reads (end
> user
> > based) and that is why the zookeeper timeout is not our first choice to
> > increase (we would rather decrease it). The funny part is that .90 seems
> > faster for us and churns through writes at a faster clip thereby probably
> > becoming less stable sooner due to the JVM not being able to handle it.
> > Should we schedule a rolling restart every 24 hours? How do production
> > systems accept volume writes through the front door without melting the
> JVM
> > due to fragmentation? We can possibly switch to bulk writes but
> performance
> > is not our problem...stability is. We are pushing 40k writes/node/sec
> > sustained with well balanced regions hour after hour day after day (until
> a
> > zookeeper tear down).
> >
> > Great to hear it is actively being looked at. I will keep an eye on
> #3455.
> >
> > Below are our GC options, many of which are from work with the other java
> > database. Should I go back to the default settings? Should I use those
> > referenced in the Jira #3455 (-XX:+UseConcMarkSweepGC
> > -XX:CMSInitiatingOccupancyFraction=65 -Xms8g -Xmx8g). We are also using
> > Java6u23.
> >
> >
> > export HBASE_HEAPSIZE=8192
> > export HBASE_OPTS="-XX:+UseCMSInitiatingOccupancyOnly
> > -XX:CMSInitiatingOccupancyFraction=60 -XX:+CMSParallelRemarkEnabled
> > -XX:SurvivorRatio=8 -XX:NewRatio=3 -XX:MaxTenuringThreshold=1
> > -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC
> > -XX:+CMSIncrementalMode"
> > export HBASE_OPTS="$HBASE_OPTS -verbose:gc -XX:+PrintGCDetails
> > -XX:+PrintGCDateStamps -Xloggc:$HBASE_HOME/logs/gc-hbase.log"
> >
> >
> > Thanks for your help!
> >
> >
>

Re: SocketTimeoutException caused by GC?

Posted by Jean-Daniel Cryans <jd...@apache.org>.
To mitigate heap fragmentation, you could consider adding more nodes
to the cluster :)

Regarding rolling restarts, currently there's one major issue:
https://issues.apache.org/jira/browse/HBASE-3441

How it currently works is a bit dumb, when you cleanly close a region
server it will first close all incoming connections and then will
procede to close the regions and it's not until it's fully done that
it will report to the master. What it means for your clients is that a
portion of the regions will become unavailable for some time until the
region server is done shutting down. How long you ask? Well it depends
on 1) how many regions you have but also mostly 2) how much data needs
to be flushed from the MemStores. On one of our clusters, shutting
down HBase takes a few minutes since our write pattern is almost
perfectly distributed meaning that all the memstore space is always
full from all the regions (luckily it's a cluster that serves only
mapreduce jobs).

Writing this gives me an idea... I think one "easy" way we could
achieve this region draining problem is by writing a jruby script
that:

1- Retrieves the list of regions served by a RS
2- Disables master balancing
3- Moves one by one every region out of the RS, assigning them to the
other RSs in a round-robin fashion
4- Shuts down the RS
5- Reenables master balancing

I wonder if it would work... At least it's a process that you could
stop at any time without breaking everything.

J-D

On Thu, Jan 27, 2011 at 11:38 AM, Wayne <wa...@gmail.com> wrote:
> I assumed GC was *trying* to roll. It shows the last 30min of logs with
> control characters at the end.
>
> We are not all writes. In terms of writes we can wait and the zookeeper
> timeout can go way up, but we also need to support real-time reads (end user
> based) and that is why the zookeeper timeout is not our first choice to
> increase (we would rather decrease it). The funny part is that .90 seems
> faster for us and churns through writes at a faster clip thereby probably
> becoming less stable sooner due to the JVM not being able to handle it.
> Should we schedule a rolling restart every 24 hours? How do production
> systems accept volume writes through the front door without melting the JVM
> due to fragmentation? We can possibly switch to bulk writes but performance
> is not our problem...stability is. We are pushing 40k writes/node/sec
> sustained with well balanced regions hour after hour day after day (until a
> zookeeper tear down).
>
> Great to hear it is actively being looked at. I will keep an eye on #3455.
>
> Below are our GC options, many of which are from work with the other java
> database. Should I go back to the default settings? Should I use those
> referenced in the Jira #3455 (-XX:+UseConcMarkSweepGC
> -XX:CMSInitiatingOccupancyFraction=65 -Xms8g -Xmx8g). We are also using
> Java6u23.
>
>
> export HBASE_HEAPSIZE=8192
> export HBASE_OPTS="-XX:+UseCMSInitiatingOccupancyOnly
> -XX:CMSInitiatingOccupancyFraction=60 -XX:+CMSParallelRemarkEnabled
> -XX:SurvivorRatio=8 -XX:NewRatio=3 -XX:MaxTenuringThreshold=1
> -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC
> -XX:+CMSIncrementalMode"
> export HBASE_OPTS="$HBASE_OPTS -verbose:gc -XX:+PrintGCDetails
> -XX:+PrintGCDateStamps -Xloggc:$HBASE_HOME/logs/gc-hbase.log"
>
>
> Thanks for your help!
>
>

Re: SocketTimeoutException caused by GC?

Posted by Wayne <wa...@gmail.com>.
I assumed GC was *trying* to roll. It shows the last 30min of logs with
control characters at the end.

We are not all writes. In terms of writes we can wait and the zookeeper
timeout can go way up, but we also need to support real-time reads (end user
based) and that is why the zookeeper timeout is not our first choice to
increase (we would rather decrease it). The funny part is that .90 seems
faster for us and churns through writes at a faster clip thereby probably
becoming less stable sooner due to the JVM not being able to handle it.
Should we schedule a rolling restart every 24 hours? How do production
systems accept volume writes through the front door without melting the JVM
due to fragmentation? We can possibly switch to bulk writes but performance
is not our problem...stability is. We are pushing 40k writes/node/sec
sustained with well balanced regions hour after hour day after day (until a
zookeeper tear down).

Great to hear it is actively being looked at. I will keep an eye on #3455.

Below are our GC options, many of which are from work with the other java
database. Should I go back to the default settings? Should I use those
referenced in the Jira #3455 (-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=65 -Xms8g -Xmx8g). We are also using
Java6u23.


export HBASE_HEAPSIZE=8192
export HBASE_OPTS="-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=60 -XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=8 -XX:NewRatio=3 -XX:MaxTenuringThreshold=1
-XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC
-XX:+CMSIncrementalMode"
export HBASE_OPTS="$HBASE_OPTS -verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCDateStamps -Xloggc:$HBASE_HOME/logs/gc-hbase.log"


Thanks for your help!


On Thu, Jan 27, 2011 at 1:53 PM, Stack <st...@duboce.net> wrote:

> On Thu, Jan 27, 2011 at 6:48 AM, Wayne <wa...@gmail.com> wrote:
> > We have got .90 up and running well, but again after 24 hours of loading
> a
> > node went down. Under it all I assume it is a GC issue, but the GC
> logging
> > rolls every < 60 minutes so I can never see logs from 5 hours ago
> (working
> > on getting Scribe up to solve that).
>
> Is it possible to ask GC logs roll?  Whats the config. please?
>
> > Most of our issues are a node being
> > marked as dead after being un-responsive. It often starts with a socket
> > timeout. We can turn up the timeout for zookeeper but that is not dealing
> > with the issue.
> >
> > Here is the first sign of trouble. Is the 1 min 34 second gap below most
> > likely a stop the world GC?
> >
> > 2011-01-27 07:00:43,716 INFO
> org.apache.hadoop.hbase.regionserver.wal.HLog:
> > Roll
> /hbase/.logs/x.x.x.6,60020,1295969329357/x.x.x.6%3A60020.1296111623011,
> > entries=242, filesize=69508440. New hlog
> > /hbase/.logs/x.x.x.6,60020,1295969329357/x.x.x.6%3A60020.1296111643436
> > 2011-01-27 07:02:17,663 WARN org.apache.hadoop.hdfs.DFSClient:
> > DFSOutputStream ResponseProcessor exception for block
> > blk_-5705652521953118952_104835java.net.SocketTimeoutException: 69000
> millis
> > timeout while waiting for channel to be ready for read. ch :
> > java.nio.channels.SocketChannel[connected local=/x.x.x.6:48141
> > remote=/x.x.x.6:50010]
> >
>
> Possibly.  Whats the log history like before that?
>
> What was going on in the local DataNode at that time?  Seems like we
> failed talking to it (I'm guessing x.x.x.6 the local datanode).  Could
> be though because of GC pause.
>
>
> > It is followed by zookeeper complaining due to lack of a response.
> >
> > 2011-01-27 07:02:17,665 INFO org.apache.zookeeper.ClientCnxn: Client
> session
> > timed out, have not heard from server in 94590ms for sessionid
> > 0x2dbdc88700000e, closing socket connection and attempting reconnect
> >
>
> This is strong indicator that there was a long GC pause around this time.
>
>
> > There is also a message about the data node.
> >
> > 2011-01-27 07:02:17,665 WARN org.apache.hadoop.hdfs.DFSClient: Error
> > Recovery for block blk_4267667433820446273_104837 bad datanode[0]
> > x.x.x.6:50010
> >
>
> This is probably just the DFSClient up in the RegionServer marking the
> local datanode bad because of the above timeout talking to it
> (possibly because of GC pause).  Its marking the first datanode in the
> pipeline, probably the first, as bad.. Its probably going to try the
> next node in the pipeline.
>
>
> > And eventually the node is brought down.
> >
> > 2011-01-27 07:02:17,783 FATAL
> > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
> > server...
> >
>
> This happened because session expired on zk?  You should be able to
> see up earlier in the log.
>
>
> > The data node also shows some errors.
> >
> > 2011-01-27 07:02:17,667 ERROR
> > org.apache.hadoop.hdfs.server.datanode.DataNode:
> > DatanodeRegistration(x.x.x.6:50010,
> > storageID=DS-1438948528-x.x.x.6-50010-1295969305669, infoPort=50075,
> > ipcPort=50020):DataXceiver java.net.SocketException: Connection reset
> >
>
> Oh, you did check the DN logs.
>
> So, does this look like DN logging the RS dropping the connection
> because of timeout?  (Check the IPs -- see if it makes sense reading
> it that way).
>
> >
> > Any help, advice, ideas, or guesses would be greatly appreciated. Can
> anyone
> > sustain 30-40k writes/node/sec for days/weeks on end without using the
> bulk
> > loader? Am I rolling a rock uphill against the reality of the JVM?
> >
>
> You are going to run into heap fragmentation that brings on the
> stop-the-world GCs.  Some good progress has been made on this
> high-priority issue recently, see
> https://issues.apache.org/jira/browse/HBASE-3455, but a 'solution' is
> still to do.  Even then, my guess is that we would not be able to
> guarantee our never running into a stop-the-world GC.
>
> Are you all about writes?  If so, the bulk loader runs at one to two
> orders of magnitude faster than going via API.  For this reason alone,
> its worth exploring (You can do incremental bulk loads to a running
> table).  If reads are rare, you might down the size of your heap so
> the stop-the-worlds run quicker or you try giving HBase a really large
> heap; perhaps the elbow room will make it so we avoid fragmentation
> (but it might make it worse).   Reads and reading from cache do not
> seem to bring on the stop-the-world GC the way writes do.  Are your
> writes latency sensitive such that you can't tolerate the occasional
> long pause?  If not, up your zk lease timeouts.
>
> St.Ack
>

Re: SocketTimeoutException caused by GC?

Posted by Stack <st...@duboce.net>.
On Thu, Jan 27, 2011 at 6:48 AM, Wayne <wa...@gmail.com> wrote:
> We have got .90 up and running well, but again after 24 hours of loading a
> node went down. Under it all I assume it is a GC issue, but the GC logging
> rolls every < 60 minutes so I can never see logs from 5 hours ago (working
> on getting Scribe up to solve that).

Is it possible to ask GC logs roll?  Whats the config. please?

> Most of our issues are a node being
> marked as dead after being un-responsive. It often starts with a socket
> timeout. We can turn up the timeout for zookeeper but that is not dealing
> with the issue.
>
> Here is the first sign of trouble. Is the 1 min 34 second gap below most
> likely a stop the world GC?
>
> 2011-01-27 07:00:43,716 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
> Roll /hbase/.logs/x.x.x.6,60020,1295969329357/x.x.x.6%3A60020.1296111623011,
> entries=242, filesize=69508440. New hlog
> /hbase/.logs/x.x.x.6,60020,1295969329357/x.x.x.6%3A60020.1296111643436
> 2011-01-27 07:02:17,663 WARN org.apache.hadoop.hdfs.DFSClient:
> DFSOutputStream ResponseProcessor exception for block
> blk_-5705652521953118952_104835java.net.SocketTimeoutException: 69000 millis
> timeout while waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/x.x.x.6:48141
> remote=/x.x.x.6:50010]
>

Possibly.  Whats the log history like before that?

What was going on in the local DataNode at that time?  Seems like we
failed talking to it (I'm guessing x.x.x.6 the local datanode).  Could
be though because of GC pause.


> It is followed by zookeeper complaining due to lack of a response.
>
> 2011-01-27 07:02:17,665 INFO org.apache.zookeeper.ClientCnxn: Client session
> timed out, have not heard from server in 94590ms for sessionid
> 0x2dbdc88700000e, closing socket connection and attempting reconnect
>

This is strong indicator that there was a long GC pause around this time.


> There is also a message about the data node.
>
> 2011-01-27 07:02:17,665 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_4267667433820446273_104837 bad datanode[0]
> x.x.x.6:50010
>

This is probably just the DFSClient up in the RegionServer marking the
local datanode bad because of the above timeout talking to it
(possibly because of GC pause).  Its marking the first datanode in the
pipeline, probably the first, as bad.. Its probably going to try the
next node in the pipeline.


> And eventually the node is brought down.
>
> 2011-01-27 07:02:17,783 FATAL
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
> server...
>

This happened because session expired on zk?  You should be able to
see up earlier in the log.


> The data node also shows some errors.
>
> 2011-01-27 07:02:17,667 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(x.x.x.6:50010,
> storageID=DS-1438948528-x.x.x.6-50010-1295969305669, infoPort=50075,
> ipcPort=50020):DataXceiver java.net.SocketException: Connection reset
>

Oh, you did check the DN logs.

So, does this look like DN logging the RS dropping the connection
because of timeout?  (Check the IPs -- see if it makes sense reading
it that way).

>
> Any help, advice, ideas, or guesses would be greatly appreciated. Can anyone
> sustain 30-40k writes/node/sec for days/weeks on end without using the bulk
> loader? Am I rolling a rock uphill against the reality of the JVM?
>

You are going to run into heap fragmentation that brings on the
stop-the-world GCs.  Some good progress has been made on this
high-priority issue recently, see
https://issues.apache.org/jira/browse/HBASE-3455, but a 'solution' is
still to do.  Even then, my guess is that we would not be able to
guarantee our never running into a stop-the-world GC.

Are you all about writes?  If so, the bulk loader runs at one to two
orders of magnitude faster than going via API.  For this reason alone,
its worth exploring (You can do incremental bulk loads to a running
table).  If reads are rare, you might down the size of your heap so
the stop-the-worlds run quicker or you try giving HBase a really large
heap; perhaps the elbow room will make it so we avoid fragmentation
(but it might make it worse).   Reads and reading from cache do not
seem to bring on the stop-the-world GC the way writes do.  Are your
writes latency sensitive such that you can't tolerate the occasional
long pause?  If not, up your zk lease timeouts.

St.Ack

Re: SocketTimeoutException caused by GC?

Posted by Wayne <wa...@gmail.com>.
I only see the bad datanode error on the one node right before zookeeper
brought it down.

On Thu, Jan 27, 2011 at 10:53 AM, Ted Yu <yu...@gmail.com> wrote:

> About bad datanode error, I found 164 occurrences in 7-node dev cluster
> hbase 0.90 region server logs.
> In our 14 node staging cluster running hbase 0.20.6, I found none.
>
> Both use cdh3b2 hadoop.
>
> On Thu, Jan 27, 2011 at 6:48 AM, Wayne <wa...@gmail.com> wrote:
>
> > We have got .90 up and running well, but again after 24 hours of loading
> a
> > node went down. Under it all I assume it is a GC issue, but the GC
> logging
> > rolls every < 60 minutes so I can never see logs from 5 hours ago
> (working
> > on getting Scribe up to solve that). Most of our issues are a node being
> > marked as dead after being un-responsive. It often starts with a socket
> > timeout. We can turn up the timeout for zookeeper but that is not dealing
> > with the issue.
> >
> > Here is the first sign of trouble. Is the 1 min 34 second gap below most
> > likely a stop the world GC?
> >
> > 2011-01-27 07:00:43,716 INFO
> org.apache.hadoop.hbase.regionserver.wal.HLog:
> > Roll
> > /hbase/.logs/x.x.x.6,60020,1295969329357/x.x.x.6%3A60020.1296111623011,
> > entries=242, filesize=69508440. New hlog
> > /hbase/.logs/x.x.x.6,60020,1295969329357/x.x.x.6%3A60020.1296111643436
> > 2011-01-27 07:02:17,663 WARN org.apache.hadoop.hdfs.DFSClient:
> > DFSOutputStream ResponseProcessor exception for block
> > blk_-5705652521953118952_104835java.net.SocketTimeoutException: 69000
> > millis
> > timeout while waiting for channel to be ready for read. ch :
> > java.nio.channels.SocketChannel[connected local=/x.x.x.6:48141
> > remote=/x.x.x.6:50010]
> >
> > It is followed by zookeeper complaining due to lack of a response.
> >
> > 2011-01-27 07:02:17,665 INFO org.apache.zookeeper.ClientCnxn: Client
> > session
> > timed out, have not heard from server in 94590ms for sessionid
> > 0x2dbdc88700000e, closing socket connection and attempting reconnect
> >
> > There is also a message about the data node.
> >
> > 2011-01-27 07:02:17,665 WARN org.apache.hadoop.hdfs.DFSClient: Error
> > Recovery for block blk_4267667433820446273_104837 bad datanode[0]
> > x.x.x.6:50010
> >
> > And eventually the node is brought down.
> >
> > 2011-01-27 07:02:17,783 FATAL
> > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
> > server...
> >
> > The data node also shows some errors.
> >
> > 2011-01-27 07:02:17,667 ERROR
> > org.apache.hadoop.hdfs.server.datanode.DataNode:
> > DatanodeRegistration(x.x.x.6:50010,
> > storageID=DS-1438948528-x.x.x.6-50010-1295969305669, infoPort=50075,
> > ipcPort=50020):DataXceiver java.net.SocketException: Connection reset
> >
> >
> > Any help, advice, ideas, or guesses would be greatly appreciated. Can
> > anyone
> > sustain 30-40k writes/node/sec for days/weeks on end without using the
> bulk
> > loader? Am I rolling a rock uphill against the reality of the JVM?
> >
> > Thanks.
> >
>

Re: SocketTimeoutException caused by GC?

Posted by Ted Yu <yu...@gmail.com>.
About bad datanode error, I found 164 occurrences in 7-node dev cluster
hbase 0.90 region server logs.
In our 14 node staging cluster running hbase 0.20.6, I found none.

Both use cdh3b2 hadoop.

On Thu, Jan 27, 2011 at 6:48 AM, Wayne <wa...@gmail.com> wrote:

> We have got .90 up and running well, but again after 24 hours of loading a
> node went down. Under it all I assume it is a GC issue, but the GC logging
> rolls every < 60 minutes so I can never see logs from 5 hours ago (working
> on getting Scribe up to solve that). Most of our issues are a node being
> marked as dead after being un-responsive. It often starts with a socket
> timeout. We can turn up the timeout for zookeeper but that is not dealing
> with the issue.
>
> Here is the first sign of trouble. Is the 1 min 34 second gap below most
> likely a stop the world GC?
>
> 2011-01-27 07:00:43,716 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
> Roll
> /hbase/.logs/x.x.x.6,60020,1295969329357/x.x.x.6%3A60020.1296111623011,
> entries=242, filesize=69508440. New hlog
> /hbase/.logs/x.x.x.6,60020,1295969329357/x.x.x.6%3A60020.1296111643436
> 2011-01-27 07:02:17,663 WARN org.apache.hadoop.hdfs.DFSClient:
> DFSOutputStream ResponseProcessor exception for block
> blk_-5705652521953118952_104835java.net.SocketTimeoutException: 69000
> millis
> timeout while waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/x.x.x.6:48141
> remote=/x.x.x.6:50010]
>
> It is followed by zookeeper complaining due to lack of a response.
>
> 2011-01-27 07:02:17,665 INFO org.apache.zookeeper.ClientCnxn: Client
> session
> timed out, have not heard from server in 94590ms for sessionid
> 0x2dbdc88700000e, closing socket connection and attempting reconnect
>
> There is also a message about the data node.
>
> 2011-01-27 07:02:17,665 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_4267667433820446273_104837 bad datanode[0]
> x.x.x.6:50010
>
> And eventually the node is brought down.
>
> 2011-01-27 07:02:17,783 FATAL
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
> server...
>
> The data node also shows some errors.
>
> 2011-01-27 07:02:17,667 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(x.x.x.6:50010,
> storageID=DS-1438948528-x.x.x.6-50010-1295969305669, infoPort=50075,
> ipcPort=50020):DataXceiver java.net.SocketException: Connection reset
>
>
> Any help, advice, ideas, or guesses would be greatly appreciated. Can
> anyone
> sustain 30-40k writes/node/sec for days/weeks on end without using the bulk
> loader? Am I rolling a rock uphill against the reality of the JVM?
>
> Thanks.
>