You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by Kannan Muthukkaruppan <Ka...@facebook.com> on 2010/01/26 04:14:01 UTC

scanner lease expired/region server shutdown

I was doing some brute force testing - running one instance of PerformanceEvaluation (PE) for writes, and another instance for randomReads.

One of the region servers went down after a while. [This is on 0.20.2]. The region server log had things like:

2010-01-25 11:33:39,416 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 34 on 60020 took 65190ms appending an edit to hlog; editcount=27878
2010-01-25 11:33:39,416 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=1815.4441MB (1903631104), Free=581.23096MB (609464832),\
 Max=2396.675MB (2513095936), Counts: Blocks=28216, Access=10904689, Hit=9313612, Miss=1591077, Evictions=110, Evicted=414336, Ratios: Hit Ratio=85.40924191474\
915%, Miss Ratio=14.590759575366974%, Evicted/Run=3766.69091796875
2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 52 on 60020 took 65195ms appending an edit to hlog; editcount=27879
2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 75 on 60020 took 65195ms appending an edit to hlog; editcount=27880
2010-01-25 11:33:39,417 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner -5220035164735014996 lease expired

This was followed by:
2010-01-25 11:33:39,636 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x12652c620520001 to sun.nio.ch.SelectionKeyImpl@788ab708<ma...@788ab708>
java.io.IOException: TIMED OUT

And also errors like:
2010-01-25 11:33:39,692 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:
org.apache.hadoop.hbase.UnknownScannerException: Name: -5220035164735014996

Eventually, the region server shutdown.
2010-01-25 11:33:40,293 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 85 on 60020, call get([B@2eec23d2, row=0007555231, maxVersions=1, timeRange=[0,9223372036854775807), families={(family=info, columns={data}}) from 10.131.1.186:51645: error: java.io.IOException: Server not running, aborting

Looks like for some reason the region server got overloaded, and was not able to finish the scan (I am assuming this is the period meta/root region scan initiated by the master).

Any thoughts/ideas on the above?

Subsequently, I restarted the region server and just started the PE in sequentialWrite mode. It seems to be off to a really slow start. For several regions, it is printing the following:

10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta attempt 0 of 10 failed; retrying after sleep of 1000 because: region offline: TestTable,0031841132,1264121653303
10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0032000000,99999999999999
10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Cached location for .META.,,1 is 10.129.68.212:60020
10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000 because: region offline: TestTable,0031841132,1264121653303
10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0032000000,99999999999999
[..and so on..]
10/01/25 18:12:59 DEBUG client.HConnectionManager$TableServers: Cached location for .META.,,1 is 10.129.68.212:60020
10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta attempt 8 of 10 failed; retrying after sleep of 16000 because: region offline: TestTable,0031841132,1264121653303
10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0032000000,99999999999999

When I scan the '.META.', I noticed that most of the regions for which I get the above error were in the middle of (?) a split. For example:

TestTable,0031841132,126412 column=info:regioninfo, timestamp=1264470099796, value=REGION => {NAME => 'TestTa
 1653303                     ble,0031841132,1264121653303', STARTKEY => '0031841132', ENDKEY => '0032057774',
                             ENCODED => 644500792, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'TestTab
                             le', FAMILIES => [{NAME => 'info', VERSIONS => '3', COMPRESSION => 'NONE', TTL =>
                              '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]
                             }}
 TestTable,0031841132,126412 column=info:server, timestamp=1264121655148, value=10.129.68.214:60020
 1653303
 TestTable,0031841132,126412 column=info:serverstartcode, timestamp=1264121655148, value=1264109117245
 1653303
 TestTable,0031841132,126412 column=info:splitA, timestamp=1264470099796, value=\000\n0031978284\000\000\000\0
 1653303                     01&g\016\357\277\275\024\"TestTable,0031841132,1264449401108\000\n0031841132\000\
                             000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0
                             05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0
                             00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0
                             00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\
                             000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M
                             EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004true\357\277\27
                             5\337\274#
 TestTable,0031841132,126412 column=info:splitB, timestamp=1264470099796, value=\000\n0032057774\000\000\000\0
 1653303                     01&g\016\357\277\275\024\"TestTable,0031978284,1264449401108\000\n0031978284\000\
                             000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0
                             05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0
                             00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0
                             00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\
                             000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M
                             EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004truej\357\277\2
                             75\312\262

Any thoughts/ideas on what might be going on? Appreciate any input in helping understand what might be going on.

Regards,
Kannan

Re: scanner lease expired/region server shutdown

Posted by Jean-Daniel Cryans <jd...@apache.org>.

I've seen this before, this is a GC pause that took more than a
minute. It was enough to time out the scanner leases and zookeeper
session.

J-D

On Mon, Jan 25, 2010 at 7:14 PM, Kannan Muthukkaruppan
<Ka...@facebook.com> wrote:
> I was doing some brute force testing - running one instance of PerformanceEvaluation (PE) for writes, and another instance for randomReads.
>
> One of the region servers went down after a while. [This is on 0.20.2]. The region server log had things like:
>
> 2010-01-25 11:33:39,416 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 34 on 60020 took 65190ms appending an edit to hlog; editcount=27878
> 2010-01-25 11:33:39,416 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=1815.4441MB (1903631104), Free=581.23096MB (609464832),\
>  Max=2396.675MB (2513095936), Counts: Blocks=28216, Access=10904689, Hit=9313612, Miss=1591077, Evictions=110, Evicted=414336, Ratios: Hit Ratio=85.40924191474\
> 915%, Miss Ratio=14.590759575366974%, Evicted/Run=3766.69091796875
> 2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 52 on 60020 took 65195ms appending an edit to hlog; editcount=27879
> 2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 75 on 60020 took 65195ms appending an edit to hlog; editcount=27880
> 2010-01-25 11:33:39,417 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner -5220035164735014996 lease expired
>
> This was followed by:
> 2010-01-25 11:33:39,636 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x12652c620520001 to sun.nio.ch.SelectionKeyImpl@788ab708<ma...@788ab708>
> java.io.IOException: TIMED OUT
>
> And also errors like:
> 2010-01-25 11:33:39,692 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:
> org.apache.hadoop.hbase.UnknownScannerException: Name: -5220035164735014996
>
> Eventually, the region server shutdown.
> 2010-01-25 11:33:40,293 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 85 on 60020, call get([B@2eec23d2, row=0007555231, maxVersions=1, timeRange=[0,9223372036854775807), families={(family=info, columns={data}}) from 10.131.1.186:51645: error: java.io.IOException: Server not running, aborting
>
> Looks like for some reason the region server got overloaded, and was not able to finish the scan (I am assuming this is the period meta/root region scan initiated by the master).
>
> Any thoughts/ideas on the above?
>
> Subsequently, I restarted the region server and just started the PE in sequentialWrite mode. It seems to be off to a really slow start. For several regions, it is printing the following:
>
> 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta attempt 0 of 10 failed; retrying after sleep of 1000 because: region offline: TestTable,0031841132,1264121653303
> 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0032000000,99999999999999
> 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Cached location for .META.,,1 is 10.129.68.212:60020
> 10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000 because: region offline: TestTable,0031841132,1264121653303
> 10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0032000000,99999999999999
> [..and so on..]
> 10/01/25 18:12:59 DEBUG client.HConnectionManager$TableServers: Cached location for .META.,,1 is 10.129.68.212:60020
> 10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta attempt 8 of 10 failed; retrying after sleep of 16000 because: region offline: TestTable,0031841132,1264121653303
> 10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0032000000,99999999999999
>
> When I scan the '.META.', I noticed that most of the regions for which I get the above error were in the middle of (?) a split. For example:
>
> TestTable,0031841132,126412 column=info:regioninfo, timestamp=1264470099796, value=REGION => {NAME => 'TestTa
>  1653303                     ble,0031841132,1264121653303', STARTKEY => '0031841132', ENDKEY => '0032057774',
>                             ENCODED => 644500792, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'TestTab
>                             le', FAMILIES => [{NAME => 'info', VERSIONS => '3', COMPRESSION => 'NONE', TTL =>
>                              '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]
>                             }}
>  TestTable,0031841132,126412 column=info:server, timestamp=1264121655148, value=10.129.68.214:60020
>  1653303
>  TestTable,0031841132,126412 column=info:serverstartcode, timestamp=1264121655148, value=1264109117245
>  1653303
>  TestTable,0031841132,126412 column=info:splitA, timestamp=1264470099796, value=\000\n0031978284\000\000\000\0
>  1653303                     01&g\016\357\277\275\024\"TestTable,0031841132,1264449401108\000\n0031841132\000\
>                             000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0
>                             05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0
>                             00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0
>                             00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\
>                             000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M
>                             EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004true\357\277\27
>                             5\337\274#
>  TestTable,0031841132,126412 column=info:splitB, timestamp=1264470099796, value=\000\n0032057774\000\000\000\0
>  1653303                     01&g\016\357\277\275\024\"TestTable,0031978284,1264449401108\000\n0031978284\000\
>                             000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0
>                             05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0
>                             00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0
>                             00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\
>                             000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M
>                             EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004truej\357\277\2
>                             75\312\262
>
> Any thoughts/ideas on what might be going on? Appreciate any input in helping understand what might be going on.
>
> Regards,
> Kannan
>
>
>

RE: scanner lease expired/region server shutdown

Posted by Kannan Muthukkaruppan <Ka...@facebook.com>.

This was on a 2.6.20.

Nothing in the /var/log/messages either around that time-- so we may not have hit any system wide limits. Perhaps the DataNode process hit its process limit (64k) under some corner case/leak scenario.

I tried to repro it, but this time lsof count on DataNode & RegionServer processes stayed within 300 each during similar workload. So haven't been able to repro it.

regards,
Kannan
-----Original Message-----
From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of Jean-Daniel Cryans
Sent: Tuesday, January 26, 2010 1:54 PM
To: hbase-dev@hadoop.apache.org
Subject: Re: scanner lease expired/region server shutdown

One other source of Too many open files is if you have an old 2.6.27
http://pero.blogs.aprilmayjune.org/2009/01/22/hadoop-and-linux-kernel-2627-epoll-limits/

J-D

On Tue, Jan 26, 2010 at 12:58 PM, Kannan Muthukkaruppan
<Ka...@facebook.com> wrote:
> Dhruba: yes, the "Too many open files" exception is getting reported by the DN process. The same node is also running an HBase region server.
>
> And yes, I confirmed that the xcievers setting is 2048.
>
> Regards,
> Kannan
> -----Original Message-----
> From: Dhruba Borthakur [mailto:dhruba@gmail.com]
> Sent: Tuesday, January 26, 2010 10:10 AM
> To: hbase-dev@hadoop.apache.org
> Subject: Re: scanner lease expired/region server shutdown
>
> This exception is from the DataNode, right? This means that the datanode
> process has 32K files open simultaneously, how can that be? For each block
> read/write the datanode has two open files, one for the data and one for the
> .meta where the crc gets stored.
>
> On the other hand, the datanode is configured via dfs.datanode.max.xcievers
> to support 2048 read/write request simultanously, right?
>
> thanks,
> dhruba
>
>
> On Tue, Jan 26, 2010 at 7:10 AM, Kannan Muthukkaruppan
> <Ka...@facebook.com>wrote:
>
>>
>> Looking further up in the logs (about 20 minutes prior in the logs when
>> errors first started happening), I noticed the following.
>>
>> btw, ulimit -a shows that I have "open files" set to 64k. Is that not
>> sufficient?
>>
>> 2010-01-25 11:10:21,774 WARN
>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>> 10.129.68.212:50010,
>> storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075,
>> ipcPort=50020):Data\
>> XceiveServer: java.io.IOException: Too many open files
>>        at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>>        at
>> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:145)
>>        at
>> sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84)
>>        at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:130)
>>         at java.lang.Thread.run(Thread.java:619)
>>
>> 2010-01-25 11:10:21,566 WARN
>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>> 10.129.68.212:50010,
>> storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075,
>> ipcPort=50020):Got \
>> exception while serving blk_3332344970774019423_10249 to /10.129.68.212:
>> java.io.FileNotFoundException:
>> /mnt/d1/HDFS-kannan1/current/subdir23/blk_3332344970774019423_10249.meta
>> (Too many open files)
>>        at java.io.FileInputStream.open(Native Method)
>>        at java.io.FileInputStream.<init>(FileInputStream.java:106)
>>        at
>> org.apache.hadoop.hdfs.server.datanode.FSDataset.getMetaDataInputStream(FSDataset.java:682)
>>        at
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:97)
>>        at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
>>        at java.lang.Thread.run(Thread.java:619)
>>
>>
>>
>> ________________________________________
>> From: Kannan Muthukkaruppan [Kannan@facebook.com]
>> Sent: Tuesday, January 26, 2010 7:01 AM
>> To: hbase-dev@hadoop.apache.org
>> Subject: RE: scanner lease expired/region server shutdown
>>
>> 1. Yes, it is a 5 node, setup.
>>
>> 1 Name Node/4 Data Nodes. Of the 4 DN, one is running the HBase Master, and
>> the other three are running region servers. ZK is on all the same 5 nodes.
>> Should ideally have separated this out. The nodes are 16GB, 4 disk machines.
>>
>> 2. I examined the HDFS datanode log on the same machine around that time
>> the problems happened, and saw this:
>>
>> 2010-01-25 11:33:09,531 WARN
>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>> 10.129.68.212:50010,
>> storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075,
>> ipcPort=\
>> 50020):Got exception while serving blk_5691809099673541164_10475 to /
>> 10.129.68.212:
>> java.net.SocketTimeoutException: 480000 millis timeout while waiting for
>> channel to be ready for write. ch :
>> java.nio.channels.SocketChannel[connected local=/10.129.68.212:50010remote=/
>> 10.129.68.212:477\
>> 29]
>>        at
>> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
>>        at
>> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
>>        at
>> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
>>        at
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313)
>>        at
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400)
>>        at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180)
>>        at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
>>        at java.lang.Thread.run(Thread.java:619)
>>
>> [For future runs, will try to turn on Java GC logging as well as per your
>> suggestion.]
>>
>> 3.  You wrote <<< There is no line after the above?  A region with a
>> startkey of 0031841132?  >>>
>>
>> I just cut-paste a section of the PE log and the scan of the '.META.' from
>> shell. But there were many such regions for which the PE client reported
>> errors of the form:
>>
>> 10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers:
>> locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000
>> because: Connection refused
>> 10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers: Removed
>> .META.,,1 for tableName=.META. from cache because of
>> TestTable,0015411453,99999999999999
>> 10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers: Cached
>> location for .META.,,1 is 10.129.68.212:60020
>>
>> and scan of '.META.' revealed that those regions, such as the
>> 'TestTable,0015411453,99999999999999', had splitA & splitB portions in META.
>>
>> <<< Then its a dropped edit of .META.  The parent region -- the one being
>> split got updates which added the splitA and splitB and its offlining but
>> the new region didn't get inserted?  The crash happened just at that
>> time?>>>
>>
>> Didn't fully understand the above.
>>
>> <<< It would be interesting to take a look at the regionserver logs from
>> that time.  Please post if you have a moment so we can take a looksee.>>>
>>
>> Will do. Should I just send it as an attachment to the list? Or is their a
>> recommended way of doing this?
>>
>> regards,
>> Kannan
>> ________________________________________
>> From: saint.ack@gmail.com [saint.ack@gmail.com] On Behalf Of Stack [
>> stack@duboce.net]
>> Sent: Monday, January 25, 2010 8:59 PM
>> To: hbase-dev@hadoop.apache.org
>> Subject: Re: scanner lease expired/region server shutdown
>>
>> What J-D said but with ornamentation.  See below.
>>
>> On Mon, Jan 25, 2010 at 7:14 PM, Kannan Muthukkaruppan
>> <Ka...@facebook.com> wrote:
>> > I was doing some brute force testing - running one instance of
>> PerformanceEvaluation (PE) for writes, and another instance for randomReads.
>> >
>>
>> Whats the cluster your hitting like?  That 5-node thingy?  Whats the
>> hardware profile.
>>
>>
>> > One of the region servers went down after a while. [This is on 0.20.2].
>> The region server log had things like:
>> >
>> > 2010-01-25 11:33:39,416 WARN org.apache.hadoop.hbase.regionserver.HLog:
>> IPC Server handler 34 on 60020 took 65190ms appending an edit to hlog;
>> editcount=27878
>>
>> This is saying that it took 65 seconds to append to hdfs.  What was
>> going on at that time?  A fat GC in the regionserver or over in a
>> Datanode?  You can enable GC logging uncommenting stuff in the
>> hbase-env.sh.   Feed the GC log to https://gchisto.dev.java.net/
>> (suggested by the zookeeper lads).  Its good for finding the long
>> pauses.  We should find the logs around the long GC pause.  Its
>> probably a failed promotion that brought on the stop-the-world GC.  Or
>> your HDFS was struggling?
>>
>> I was going to say that the PE puts the worse kind of loading on the
>> hbase cache -- nothing sticks around -- but looking at your numbers
>> below, cache seems to be working pretty well.
>>
>> > 2010-01-25 11:33:39,416 DEBUG
>> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
>> Total=1815.4441MB (1903631104), Free=581.23096MB (609464832),\
>> >  Max=2396.675MB (2513095936), Counts: Blocks=28216, Access=10904689,
>> Hit=9313612, Miss=1591077, Evictions=110, Evicted=414336, Ratios: Hit
>> Ratio=85.40924191474\
>> > 915%, Miss Ratio=14.590759575366974%, Evicted/Run=3766.69091796875
>> > 2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog:
>> IPC Server handler 52 on 60020 took 65195ms appending an edit to hlog;
>> editcount=27879
>> > 2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog:
>> IPC Server handler 75 on 60020 took 65195ms appending an edit to hlog;
>> editcount=27880
>> > 2010-01-25 11:33:39,417 INFO
>> org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
>> -5220035164735014996 lease expired
>> >
>> > This was followed by:
>> > 2010-01-25 11:33:39,636 WARN org.apache.zookeeper.ClientCnxn: Exception
>> closing session 0x12652c620520001 to sun.nio.ch.SelectionKeyImpl@788ab708
>> <ma...@788ab708>
>> > java.io.IOException: TIMED OUT
>>
>> This is the zk timeout.
>>
>> >
>> > And also errors like:
>> > 2010-01-25 11:33:39,692 ERROR
>> org.apache.hadoop.hbase.regionserver.HRegionServer:
>> > org.apache.hadoop.hbase.UnknownScannerException: Name:
>> -5220035164735014996
>> >
>>
>> These will happen after the above.  The regionserver is on its way
>> down.  Probably emptied the list of outstanding regions.
>>
>>
>> > Subsequently, I restarted the region server and just started the PE in
>> sequentialWrite mode. It seems to be off to a really slow start. For several
>> regions, it is printing the following:
>> >
>> > 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers:
>> locateRegionInMeta attempt 0 of 10 failed; retrying after sleep of 1000
>> because: region offline: TestTable,0031841132,1264121653303
>> > 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Removed
>> .META.,,1 for tableName=.META. from cache because of
>> TestTable,0032000000,99999999999999
>> > 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Cached
>> location for .META.,,1 is 10.129.68.212:60020
>> > 10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers:
>> locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000
>> because: region offline: TestTable,0031841132,1264121653303
>> > 10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers: Removed
>> .META.,,1 for tableName=.META. from cache because of
>> TestTable,0032000000,99999999999999
>> > [..and so on..]
>> > 10/01/25 18:12:59 DEBUG client.HConnectionManager$TableServers: Cached
>> location for .META.,,1 is 10.129.68.212:60020
>> > 10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers:
>> locateRegionInMeta attempt 8 of 10 failed; retrying after sleep of 16000
>> because: region offline: TestTable,0031841132,1264121653303
>> > 10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers: Removed
>> .META.,,1 for tableName=.META. from cache because of
>> TestTable,0032000000,99999999999999
>> >
>> > When I scan the '.META.', I noticed that most of the regions for which I
>> get the above error were in the middle of (?) a split. For example:
>> >
>> > TestTable,0031841132,126412 column=info:regioninfo,
>> timestamp=1264470099796, value=REGION => {NAME => 'TestTa
>> >  1653303                     ble,0031841132,1264121653303', STARTKEY =>
>> '0031841132', ENDKEY => '0032057774',
>> >                             ENCODED => 644500792, OFFLINE => true, SPLIT
>> => true, TABLE => {{NAME => 'TestTab
>> >                             le', FAMILIES => [{NAME => 'info', VERSIONS
>> => '3', COMPRESSION => 'NONE', TTL =>
>> >                              '2147483647', BLOCKSIZE => '65536',
>> IN_MEMORY => 'false', BLOCKCACHE => 'true'}]
>> >                             }}
>> >  TestTable,0031841132,126412 column=info:server, timestamp=1264121655148,
>> value=10.129.68.214:60020
>> >  1653303
>> >  TestTable,0031841132,126412 column=info:serverstartcode,
>> timestamp=1264121655148, value=1264109117245
>> >  1653303
>> >  TestTable,0031841132,126412 column=info:splitA, timestamp=1264470099796,
>> value=\000\n0031978284\000\000\000\0
>> >  1653303
>> 01&g\016\357\277\275\024\"TestTable,0031841132,1264449401108\000\n0031841132\000\
>> >
>> 000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0
>> >
>> 05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0
>> >
>> 00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0
>> >
>> 00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\
>> >
>> 000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M
>> >
>> EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004true\357\277\27
>> >                             5\337\274#
>> >  TestTable,0031841132,126412 column=info:splitB, timestamp=1264470099796,
>> value=\000\n0032057774\000\000\000\0
>> >  1653303
>> 01&g\016\357\277\275\024\"TestTable,0031978284,1264449401108\000\n0031978284\000\
>> >
>> 000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0
>> >
>> 05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0
>> >
>> 00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0
>> >
>> 00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\
>> >
>> 000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M
>> >
>> EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004truej\357\277\2
>> >                             75\312\262
>> >
>> > Any thoughts/ideas on what might be going on? Appreciate any input in
>> helping understand what might be going on.
>> >
>>
>> There is no line after the above?  A region with a startkey of
>> 0031841132?  Then its a dropped edit of .META.  The parent region --
>> the one being split got updates which added the splitA and splitB and
>> its offlining but the new region didn't get inserted?  The crash
>> happened just at that time?
>>
>> It would be interesting to take a look at the regionserver logs from
>> that time.  Please post if you have a moment so we can take a looksee.
>>
>> Above kinda thing is what the master rewrite is about moving state
>> transitions up to zk so atomic over cluster moving regions through
>> transitions rather than as here, a multi-row update might not all go
>> through as things currently work.
>>
>> St.Ack
>>
>> > Regards,
>> > Kannan
>> >
>> >
>> >
>>
>
>
>
> --
> Connect to me at http://www.facebook.com/dhruba
>

Re: scanner lease expired/region server shutdown

Posted by Jean-Daniel Cryans <jd...@apache.org>.

One other source of Too many open files is if you have an old 2.6.27
http://pero.blogs.aprilmayjune.org/2009/01/22/hadoop-and-linux-kernel-2627-epoll-limits/

J-D

On Tue, Jan 26, 2010 at 12:58 PM, Kannan Muthukkaruppan
<Ka...@facebook.com> wrote:
> Dhruba: yes, the "Too many open files" exception is getting reported by the DN process. The same node is also running an HBase region server.
>
> And yes, I confirmed that the xcievers setting is 2048.
>
> Regards,
> Kannan
> -----Original Message-----
> From: Dhruba Borthakur [mailto:dhruba@gmail.com]
> Sent: Tuesday, January 26, 2010 10:10 AM
> To: hbase-dev@hadoop.apache.org
> Subject: Re: scanner lease expired/region server shutdown
>
> This exception is from the DataNode, right? This means that the datanode
> process has 32K files open simultaneously, how can that be? For each block
> read/write the datanode has two open files, one for the data and one for the
> .meta where the crc gets stored.
>
> On the other hand, the datanode is configured via dfs.datanode.max.xcievers
> to support 2048 read/write request simultanously, right?
>
> thanks,
> dhruba
>
>
> On Tue, Jan 26, 2010 at 7:10 AM, Kannan Muthukkaruppan
> <Ka...@facebook.com>wrote:
>
>>
>> Looking further up in the logs (about 20 minutes prior in the logs when
>> errors first started happening), I noticed the following.
>>
>> btw, ulimit -a shows that I have "open files" set to 64k. Is that not
>> sufficient?
>>
>> 2010-01-25 11:10:21,774 WARN
>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>> 10.129.68.212:50010,
>> storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075,
>> ipcPort=50020):Data\
>> XceiveServer: java.io.IOException: Too many open files
>>        at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>>        at
>> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:145)
>>        at
>> sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84)
>>        at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:130)
>>         at java.lang.Thread.run(Thread.java:619)
>>
>> 2010-01-25 11:10:21,566 WARN
>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>> 10.129.68.212:50010,
>> storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075,
>> ipcPort=50020):Got \
>> exception while serving blk_3332344970774019423_10249 to /10.129.68.212:
>> java.io.FileNotFoundException:
>> /mnt/d1/HDFS-kannan1/current/subdir23/blk_3332344970774019423_10249.meta
>> (Too many open files)
>>        at java.io.FileInputStream.open(Native Method)
>>        at java.io.FileInputStream.<init>(FileInputStream.java:106)
>>        at
>> org.apache.hadoop.hdfs.server.datanode.FSDataset.getMetaDataInputStream(FSDataset.java:682)
>>        at
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:97)
>>        at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
>>        at java.lang.Thread.run(Thread.java:619)
>>
>>
>>
>> ________________________________________
>> From: Kannan Muthukkaruppan [Kannan@facebook.com]
>> Sent: Tuesday, January 26, 2010 7:01 AM
>> To: hbase-dev@hadoop.apache.org
>> Subject: RE: scanner lease expired/region server shutdown
>>
>> 1. Yes, it is a 5 node, setup.
>>
>> 1 Name Node/4 Data Nodes. Of the 4 DN, one is running the HBase Master, and
>> the other three are running region servers. ZK is on all the same 5 nodes.
>> Should ideally have separated this out. The nodes are 16GB, 4 disk machines.
>>
>> 2. I examined the HDFS datanode log on the same machine around that time
>> the problems happened, and saw this:
>>
>> 2010-01-25 11:33:09,531 WARN
>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>> 10.129.68.212:50010,
>> storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075,
>> ipcPort=\
>> 50020):Got exception while serving blk_5691809099673541164_10475 to /
>> 10.129.68.212:
>> java.net.SocketTimeoutException: 480000 millis timeout while waiting for
>> channel to be ready for write. ch :
>> java.nio.channels.SocketChannel[connected local=/10.129.68.212:50010remote=/
>> 10.129.68.212:477\
>> 29]
>>        at
>> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
>>        at
>> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
>>        at
>> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
>>        at
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313)
>>        at
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400)
>>        at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180)
>>        at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
>>        at java.lang.Thread.run(Thread.java:619)
>>
>> [For future runs, will try to turn on Java GC logging as well as per your
>> suggestion.]
>>
>> 3.  You wrote <<< There is no line after the above?  A region with a
>> startkey of 0031841132?  >>>
>>
>> I just cut-paste a section of the PE log and the scan of the '.META.' from
>> shell. But there were many such regions for which the PE client reported
>> errors of the form:
>>
>> 10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers:
>> locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000
>> because: Connection refused
>> 10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers: Removed
>> .META.,,1 for tableName=.META. from cache because of
>> TestTable,0015411453,99999999999999
>> 10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers: Cached
>> location for .META.,,1 is 10.129.68.212:60020
>>
>> and scan of '.META.' revealed that those regions, such as the
>> 'TestTable,0015411453,99999999999999', had splitA & splitB portions in META.
>>
>> <<< Then its a dropped edit of .META.  The parent region -- the one being
>> split got updates which added the splitA and splitB and its offlining but
>> the new region didn't get inserted?  The crash happened just at that
>> time?>>>
>>
>> Didn't fully understand the above.
>>
>> <<< It would be interesting to take a look at the regionserver logs from
>> that time.  Please post if you have a moment so we can take a looksee.>>>
>>
>> Will do. Should I just send it as an attachment to the list? Or is their a
>> recommended way of doing this?
>>
>> regards,
>> Kannan
>> ________________________________________
>> From: saint.ack@gmail.com [saint.ack@gmail.com] On Behalf Of Stack [
>> stack@duboce.net]
>> Sent: Monday, January 25, 2010 8:59 PM
>> To: hbase-dev@hadoop.apache.org
>> Subject: Re: scanner lease expired/region server shutdown
>>
>> What J-D said but with ornamentation.  See below.
>>
>> On Mon, Jan 25, 2010 at 7:14 PM, Kannan Muthukkaruppan
>> <Ka...@facebook.com> wrote:
>> > I was doing some brute force testing - running one instance of
>> PerformanceEvaluation (PE) for writes, and another instance for randomReads.
>> >
>>
>> Whats the cluster your hitting like?  That 5-node thingy?  Whats the
>> hardware profile.
>>
>>
>> > One of the region servers went down after a while. [This is on 0.20.2].
>> The region server log had things like:
>> >
>> > 2010-01-25 11:33:39,416 WARN org.apache.hadoop.hbase.regionserver.HLog:
>> IPC Server handler 34 on 60020 took 65190ms appending an edit to hlog;
>> editcount=27878
>>
>> This is saying that it took 65 seconds to append to hdfs.  What was
>> going on at that time?  A fat GC in the regionserver or over in a
>> Datanode?  You can enable GC logging uncommenting stuff in the
>> hbase-env.sh.   Feed the GC log to https://gchisto.dev.java.net/
>> (suggested by the zookeeper lads).  Its good for finding the long
>> pauses.  We should find the logs around the long GC pause.  Its
>> probably a failed promotion that brought on the stop-the-world GC.  Or
>> your HDFS was struggling?
>>
>> I was going to say that the PE puts the worse kind of loading on the
>> hbase cache -- nothing sticks around -- but looking at your numbers
>> below, cache seems to be working pretty well.
>>
>> > 2010-01-25 11:33:39,416 DEBUG
>> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
>> Total=1815.4441MB (1903631104), Free=581.23096MB (609464832),\
>> >  Max=2396.675MB (2513095936), Counts: Blocks=28216, Access=10904689,
>> Hit=9313612, Miss=1591077, Evictions=110, Evicted=414336, Ratios: Hit
>> Ratio=85.40924191474\
>> > 915%, Miss Ratio=14.590759575366974%, Evicted/Run=3766.69091796875
>> > 2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog:
>> IPC Server handler 52 on 60020 took 65195ms appending an edit to hlog;
>> editcount=27879
>> > 2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog:
>> IPC Server handler 75 on 60020 took 65195ms appending an edit to hlog;
>> editcount=27880
>> > 2010-01-25 11:33:39,417 INFO
>> org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
>> -5220035164735014996 lease expired
>> >
>> > This was followed by:
>> > 2010-01-25 11:33:39,636 WARN org.apache.zookeeper.ClientCnxn: Exception
>> closing session 0x12652c620520001 to sun.nio.ch.SelectionKeyImpl@788ab708
>> <ma...@788ab708>
>> > java.io.IOException: TIMED OUT
>>
>> This is the zk timeout.
>>
>> >
>> > And also errors like:
>> > 2010-01-25 11:33:39,692 ERROR
>> org.apache.hadoop.hbase.regionserver.HRegionServer:
>> > org.apache.hadoop.hbase.UnknownScannerException: Name:
>> -5220035164735014996
>> >
>>
>> These will happen after the above.  The regionserver is on its way
>> down.  Probably emptied the list of outstanding regions.
>>
>>
>> > Subsequently, I restarted the region server and just started the PE in
>> sequentialWrite mode. It seems to be off to a really slow start. For several
>> regions, it is printing the following:
>> >
>> > 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers:
>> locateRegionInMeta attempt 0 of 10 failed; retrying after sleep of 1000
>> because: region offline: TestTable,0031841132,1264121653303
>> > 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Removed
>> .META.,,1 for tableName=.META. from cache because of
>> TestTable,0032000000,99999999999999
>> > 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Cached
>> location for .META.,,1 is 10.129.68.212:60020
>> > 10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers:
>> locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000
>> because: region offline: TestTable,0031841132,1264121653303
>> > 10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers: Removed
>> .META.,,1 for tableName=.META. from cache because of
>> TestTable,0032000000,99999999999999
>> > [..and so on..]
>> > 10/01/25 18:12:59 DEBUG client.HConnectionManager$TableServers: Cached
>> location for .META.,,1 is 10.129.68.212:60020
>> > 10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers:
>> locateRegionInMeta attempt 8 of 10 failed; retrying after sleep of 16000
>> because: region offline: TestTable,0031841132,1264121653303
>> > 10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers: Removed
>> .META.,,1 for tableName=.META. from cache because of
>> TestTable,0032000000,99999999999999
>> >
>> > When I scan the '.META.', I noticed that most of the regions for which I
>> get the above error were in the middle of (?) a split. For example:
>> >
>> > TestTable,0031841132,126412 column=info:regioninfo,
>> timestamp=1264470099796, value=REGION => {NAME => 'TestTa
>> >  1653303                     ble,0031841132,1264121653303', STARTKEY =>
>> '0031841132', ENDKEY => '0032057774',
>> >                             ENCODED => 644500792, OFFLINE => true, SPLIT
>> => true, TABLE => {{NAME => 'TestTab
>> >                             le', FAMILIES => [{NAME => 'info', VERSIONS
>> => '3', COMPRESSION => 'NONE', TTL =>
>> >                              '2147483647', BLOCKSIZE => '65536',
>> IN_MEMORY => 'false', BLOCKCACHE => 'true'}]
>> >                             }}
>> >  TestTable,0031841132,126412 column=info:server, timestamp=1264121655148,
>> value=10.129.68.214:60020
>> >  1653303
>> >  TestTable,0031841132,126412 column=info:serverstartcode,
>> timestamp=1264121655148, value=1264109117245
>> >  1653303
>> >  TestTable,0031841132,126412 column=info:splitA, timestamp=1264470099796,
>> value=\000\n0031978284\000\000\000\0
>> >  1653303
>> 01&g\016\357\277\275\024\"TestTable,0031841132,1264449401108\000\n0031841132\000\
>> >
>> 000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0
>> >
>> 05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0
>> >
>> 00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0
>> >
>> 00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\
>> >
>> 000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M
>> >
>> EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004true\357\277\27
>> >                             5\337\274#
>> >  TestTable,0031841132,126412 column=info:splitB, timestamp=1264470099796,
>> value=\000\n0032057774\000\000\000\0
>> >  1653303
>> 01&g\016\357\277\275\024\"TestTable,0031978284,1264449401108\000\n0031978284\000\
>> >
>> 000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0
>> >
>> 05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0
>> >
>> 00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0
>> >
>> 00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\
>> >
>> 000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M
>> >
>> EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004truej\357\277\2
>> >                             75\312\262
>> >
>> > Any thoughts/ideas on what might be going on? Appreciate any input in
>> helping understand what might be going on.
>> >
>>
>> There is no line after the above?  A region with a startkey of
>> 0031841132?  Then its a dropped edit of .META.  The parent region --
>> the one being split got updates which added the splitA and splitB and
>> its offlining but the new region didn't get inserted?  The crash
>> happened just at that time?
>>
>> It would be interesting to take a look at the regionserver logs from
>> that time.  Please post if you have a moment so we can take a looksee.
>>
>> Above kinda thing is what the master rewrite is about moving state
>> transitions up to zk so atomic over cluster moving regions through
>> transitions rather than as here, a multi-row update might not all go
>> through as things currently work.
>>
>> St.Ack
>>
>> > Regards,
>> > Kannan
>> >
>> >
>> >
>>
>
>
>
> --
> Connect to me at http://www.facebook.com/dhruba
>

RE: scanner lease expired/region server shutdown

Posted by Kannan Muthukkaruppan <Ka...@facebook.com>.

Dhruba: yes, the "Too many open files" exception is getting reported by the DN process. The same node is also running an HBase region server.

And yes, I confirmed that the xcievers setting is 2048.

Regards,
Kannan
-----Original Message-----
From: Dhruba Borthakur [mailto:dhruba@gmail.com]
Sent: Tuesday, January 26, 2010 10:10 AM
To: hbase-dev@hadoop.apache.org
Subject: Re: scanner lease expired/region server shutdown

This exception is from the DataNode, right? This means that the datanode
process has 32K files open simultaneously, how can that be? For each block
read/write the datanode has two open files, one for the data and one for the
.meta where the crc gets stored.

On the other hand, the datanode is configured via dfs.datanode.max.xcievers
to support 2048 read/write request simultanously, right?

thanks,
dhruba


On Tue, Jan 26, 2010 at 7:10 AM, Kannan Muthukkaruppan
<Ka...@facebook.com>wrote:

>
> Looking further up in the logs (about 20 minutes prior in the logs when
> errors first started happening), I noticed the following.
>
> btw, ulimit -a shows that I have "open files" set to 64k. Is that not
> sufficient?
>
> 2010-01-25 11:10:21,774 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 10.129.68.212:50010,
> storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075,
> ipcPort=50020):Data\
> XceiveServer: java.io.IOException: Too many open files
>        at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>        at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:145)
>        at
> sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84)
>        at
> org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:130)
>         at java.lang.Thread.run(Thread.java:619)
>
> 2010-01-25 11:10:21,566 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 10.129.68.212:50010,
> storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075,
> ipcPort=50020):Got \
> exception while serving blk_3332344970774019423_10249 to /10.129.68.212:
> java.io.FileNotFoundException:
> /mnt/d1/HDFS-kannan1/current/subdir23/blk_3332344970774019423_10249.meta
> (Too many open files)
>        at java.io.FileInputStream.open(Native Method)
>        at java.io.FileInputStream.<init>(FileInputStream.java:106)
>        at
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getMetaDataInputStream(FSDataset.java:682)
>        at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:97)
>        at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
>        at java.lang.Thread.run(Thread.java:619)
>
>
>
> ________________________________________
> From: Kannan Muthukkaruppan [Kannan@facebook.com]
> Sent: Tuesday, January 26, 2010 7:01 AM
> To: hbase-dev@hadoop.apache.org
> Subject: RE: scanner lease expired/region server shutdown
>
> 1. Yes, it is a 5 node, setup.
>
> 1 Name Node/4 Data Nodes. Of the 4 DN, one is running the HBase Master, and
> the other three are running region servers. ZK is on all the same 5 nodes.
> Should ideally have separated this out. The nodes are 16GB, 4 disk machines.
>
> 2. I examined the HDFS datanode log on the same machine around that time
> the problems happened, and saw this:
>
> 2010-01-25 11:33:09,531 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 10.129.68.212:50010,
> storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075,
> ipcPort=\
> 50020):Got exception while serving blk_5691809099673541164_10475 to /
> 10.129.68.212:
> java.net.SocketTimeoutException: 480000 millis timeout while waiting for
> channel to be ready for write. ch :
> java.nio.channels.SocketChannel[connected local=/10.129.68.212:50010remote=/
> 10.129.68.212:477\
> 29]
>        at
> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
>        at
> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
>        at
> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
>        at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313)
>        at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400)
>        at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180)
>        at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
>        at java.lang.Thread.run(Thread.java:619)
>
> [For future runs, will try to turn on Java GC logging as well as per your
> suggestion.]
>
> 3.  You wrote <<< There is no line after the above?  A region with a
> startkey of 0031841132?  >>>
>
> I just cut-paste a section of the PE log and the scan of the '.META.' from
> shell. But there were many such regions for which the PE client reported
> errors of the form:
>
> 10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers:
> locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000
> because: Connection refused
> 10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers: Removed
> .META.,,1 for tableName=.META. from cache because of
> TestTable,0015411453,99999999999999
> 10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers: Cached
> location for .META.,,1 is 10.129.68.212:60020
>
> and scan of '.META.' revealed that those regions, such as the
> 'TestTable,0015411453,99999999999999', had splitA & splitB portions in META.
>
> <<< Then its a dropped edit of .META.  The parent region -- the one being
> split got updates which added the splitA and splitB and its offlining but
> the new region didn't get inserted?  The crash happened just at that
> time?>>>
>
> Didn't fully understand the above.
>
> <<< It would be interesting to take a look at the regionserver logs from
> that time.  Please post if you have a moment so we can take a looksee.>>>
>
> Will do. Should I just send it as an attachment to the list? Or is their a
> recommended way of doing this?
>
> regards,
> Kannan
> ________________________________________
> From: saint.ack@gmail.com [saint.ack@gmail.com] On Behalf Of Stack [
> stack@duboce.net]
> Sent: Monday, January 25, 2010 8:59 PM
> To: hbase-dev@hadoop.apache.org
> Subject: Re: scanner lease expired/region server shutdown
>
> What J-D said but with ornamentation.  See below.
>
> On Mon, Jan 25, 2010 at 7:14 PM, Kannan Muthukkaruppan
> <Ka...@facebook.com> wrote:
> > I was doing some brute force testing - running one instance of
> PerformanceEvaluation (PE) for writes, and another instance for randomReads.
> >
>
> Whats the cluster your hitting like?  That 5-node thingy?  Whats the
> hardware profile.
>
>
> > One of the region servers went down after a while. [This is on 0.20.2].
> The region server log had things like:
> >
> > 2010-01-25 11:33:39,416 WARN org.apache.hadoop.hbase.regionserver.HLog:
> IPC Server handler 34 on 60020 took 65190ms appending an edit to hlog;
> editcount=27878
>
> This is saying that it took 65 seconds to append to hdfs.  What was
> going on at that time?  A fat GC in the regionserver or over in a
> Datanode?  You can enable GC logging uncommenting stuff in the
> hbase-env.sh.   Feed the GC log to https://gchisto.dev.java.net/
> (suggested by the zookeeper lads).  Its good for finding the long
> pauses.  We should find the logs around the long GC pause.  Its
> probably a failed promotion that brought on the stop-the-world GC.  Or
> your HDFS was struggling?
>
> I was going to say that the PE puts the worse kind of loading on the
> hbase cache -- nothing sticks around -- but looking at your numbers
> below, cache seems to be working pretty well.
>
> > 2010-01-25 11:33:39,416 DEBUG
> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
> Total=1815.4441MB (1903631104), Free=581.23096MB (609464832),\
> >  Max=2396.675MB (2513095936), Counts: Blocks=28216, Access=10904689,
> Hit=9313612, Miss=1591077, Evictions=110, Evicted=414336, Ratios: Hit
> Ratio=85.40924191474\
> > 915%, Miss Ratio=14.590759575366974%, Evicted/Run=3766.69091796875
> > 2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog:
> IPC Server handler 52 on 60020 took 65195ms appending an edit to hlog;
> editcount=27879
> > 2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog:
> IPC Server handler 75 on 60020 took 65195ms appending an edit to hlog;
> editcount=27880
> > 2010-01-25 11:33:39,417 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> -5220035164735014996 lease expired
> >
> > This was followed by:
> > 2010-01-25 11:33:39,636 WARN org.apache.zookeeper.ClientCnxn: Exception
> closing session 0x12652c620520001 to sun.nio.ch.SelectionKeyImpl@788ab708
> <ma...@788ab708>
> > java.io.IOException: TIMED OUT
>
> This is the zk timeout.
>
> >
> > And also errors like:
> > 2010-01-25 11:33:39,692 ERROR
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> > org.apache.hadoop.hbase.UnknownScannerException: Name:
> -5220035164735014996
> >
>
> These will happen after the above.  The regionserver is on its way
> down.  Probably emptied the list of outstanding regions.
>
>
> > Subsequently, I restarted the region server and just started the PE in
> sequentialWrite mode. It seems to be off to a really slow start. For several
> regions, it is printing the following:
> >
> > 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers:
> locateRegionInMeta attempt 0 of 10 failed; retrying after sleep of 1000
> because: region offline: TestTable,0031841132,1264121653303
> > 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Removed
> .META.,,1 for tableName=.META. from cache because of
> TestTable,0032000000,99999999999999
> > 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Cached
> location for .META.,,1 is 10.129.68.212:60020
> > 10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers:
> locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000
> because: region offline: TestTable,0031841132,1264121653303
> > 10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers: Removed
> .META.,,1 for tableName=.META. from cache because of
> TestTable,0032000000,99999999999999
> > [..and so on..]
> > 10/01/25 18:12:59 DEBUG client.HConnectionManager$TableServers: Cached
> location for .META.,,1 is 10.129.68.212:60020
> > 10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers:
> locateRegionInMeta attempt 8 of 10 failed; retrying after sleep of 16000
> because: region offline: TestTable,0031841132,1264121653303
> > 10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers: Removed
> .META.,,1 for tableName=.META. from cache because of
> TestTable,0032000000,99999999999999
> >
> > When I scan the '.META.', I noticed that most of the regions for which I
> get the above error were in the middle of (?) a split. For example:
> >
> > TestTable,0031841132,126412 column=info:regioninfo,
> timestamp=1264470099796, value=REGION => {NAME => 'TestTa
> >  1653303                     ble,0031841132,1264121653303', STARTKEY =>
> '0031841132', ENDKEY => '0032057774',
> >                             ENCODED => 644500792, OFFLINE => true, SPLIT
> => true, TABLE => {{NAME => 'TestTab
> >                             le', FAMILIES => [{NAME => 'info', VERSIONS
> => '3', COMPRESSION => 'NONE', TTL =>
> >                              '2147483647', BLOCKSIZE => '65536',
> IN_MEMORY => 'false', BLOCKCACHE => 'true'}]
> >                             }}
> >  TestTable,0031841132,126412 column=info:server, timestamp=1264121655148,
> value=10.129.68.214:60020
> >  1653303
> >  TestTable,0031841132,126412 column=info:serverstartcode,
> timestamp=1264121655148, value=1264109117245
> >  1653303
> >  TestTable,0031841132,126412 column=info:splitA, timestamp=1264470099796,
> value=\000\n0031978284\000\000\000\0
> >  1653303
> 01&g\016\357\277\275\024\"TestTable,0031841132,1264449401108\000\n0031841132\000\
> >
> 000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0
> >
> 05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0
> >
> 00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0
> >
> 00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\
> >
> 000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M
> >
> EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004true\357\277\27
> >                             5\337\274#
> >  TestTable,0031841132,126412 column=info:splitB, timestamp=1264470099796,
> value=\000\n0032057774\000\000\000\0
> >  1653303
> 01&g\016\357\277\275\024\"TestTable,0031978284,1264449401108\000\n0031978284\000\
> >
> 000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0
> >
> 05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0
> >
> 00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0
> >
> 00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\
> >
> 000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M
> >
> EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004truej\357\277\2
> >                             75\312\262
> >
> > Any thoughts/ideas on what might be going on? Appreciate any input in
> helping understand what might be going on.
> >
>
> There is no line after the above?  A region with a startkey of
> 0031841132?  Then its a dropped edit of .META.  The parent region --
> the one being split got updates which added the splitA and splitB and
> its offlining but the new region didn't get inserted?  The crash
> happened just at that time?
>
> It would be interesting to take a look at the regionserver logs from
> that time.  Please post if you have a moment so we can take a looksee.
>
> Above kinda thing is what the master rewrite is about moving state
> transitions up to zk so atomic over cluster moving regions through
> transitions rather than as here, a multi-row update might not all go
> through as things currently work.
>
> St.Ack
>
> > Regards,
> > Kannan
> >
> >
> >
>



--
Connect to me at http://www.facebook.com/dhruba

Re: scanner lease expired/region server shutdown

Posted by Stack <st...@duboce.net>.

On Tue, Jan 26, 2010 at 10:10 AM, Dhruba Borthakur <dh...@gmail.com> wrote:
> This exception is from the DataNode, right? This means that the datanode
> process has 32K files open simultaneously, how can that be? For each block
> read/write the datanode has two open files, one for the data and one for the
> .meta where the crc gets stored.
>
Yeah, its on datanode.  I was just being a little indirect saying
check the regionserver log because it prints out ulimit that the
process sees.  If not 64k there, probably not 64k for DN.


> On the other hand, the datanode is configured via dfs.datanode.max.xcievers
> to support 2048 read/write request simultanously, right?
>

So, you'd think he'd see xeiver complaint before "too many open files"
if he was running into the 64k limit.

St.Ack



> thanks,
> dhruba
>
>
> On Tue, Jan 26, 2010 at 7:10 AM, Kannan Muthukkaruppan
> <Ka...@facebook.com>wrote:
>
>>
>> Looking further up in the logs (about 20 minutes prior in the logs when
>> errors first started happening), I noticed the following.
>>
>> btw, ulimit -a shows that I have "open files" set to 64k. Is that not
>> sufficient?
>>
>> 2010-01-25 11:10:21,774 WARN
>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>> 10.129.68.212:50010,
>> storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075,
>> ipcPort=50020):Data\
>> XceiveServer: java.io.IOException: Too many open files
>>        at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>>        at
>> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:145)
>>        at
>> sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84)
>>        at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:130)
>>         at java.lang.Thread.run(Thread.java:619)
>>
>> 2010-01-25 11:10:21,566 WARN
>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>> 10.129.68.212:50010,
>> storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075,
>> ipcPort=50020):Got \
>> exception while serving blk_3332344970774019423_10249 to /10.129.68.212:
>> java.io.FileNotFoundException:
>> /mnt/d1/HDFS-kannan1/current/subdir23/blk_3332344970774019423_10249.meta
>> (Too many open files)
>>        at java.io.FileInputStream.open(Native Method)
>>        at java.io.FileInputStream.<init>(FileInputStream.java:106)
>>        at
>> org.apache.hadoop.hdfs.server.datanode.FSDataset.getMetaDataInputStream(FSDataset.java:682)
>>        at
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:97)
>>        at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
>>        at java.lang.Thread.run(Thread.java:619)
>>
>>
>>
>> ________________________________________
>> From: Kannan Muthukkaruppan [Kannan@facebook.com]
>> Sent: Tuesday, January 26, 2010 7:01 AM
>> To: hbase-dev@hadoop.apache.org
>> Subject: RE: scanner lease expired/region server shutdown
>>
>> 1. Yes, it is a 5 node, setup.
>>
>> 1 Name Node/4 Data Nodes. Of the 4 DN, one is running the HBase Master, and
>> the other three are running region servers. ZK is on all the same 5 nodes.
>> Should ideally have separated this out. The nodes are 16GB, 4 disk machines.
>>
>> 2. I examined the HDFS datanode log on the same machine around that time
>> the problems happened, and saw this:
>>
>> 2010-01-25 11:33:09,531 WARN
>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>> 10.129.68.212:50010,
>> storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075,
>> ipcPort=\
>> 50020):Got exception while serving blk_5691809099673541164_10475 to /
>> 10.129.68.212:
>> java.net.SocketTimeoutException: 480000 millis timeout while waiting for
>> channel to be ready for write. ch :
>> java.nio.channels.SocketChannel[connected local=/10.129.68.212:50010remote=/
>> 10.129.68.212:477\
>> 29]
>>        at
>> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
>>        at
>> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
>>        at
>> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
>>        at
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313)
>>        at
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400)
>>        at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180)
>>        at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
>>        at java.lang.Thread.run(Thread.java:619)
>>
>> [For future runs, will try to turn on Java GC logging as well as per your
>> suggestion.]
>>
>> 3.  You wrote <<< There is no line after the above?  A region with a
>> startkey of 0031841132?  >>>
>>
>> I just cut-paste a section of the PE log and the scan of the '.META.' from
>> shell. But there were many such regions for which the PE client reported
>> errors of the form:
>>
>> 10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers:
>> locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000
>> because: Connection refused
>> 10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers: Removed
>> .META.,,1 for tableName=.META. from cache because of
>> TestTable,0015411453,99999999999999
>> 10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers: Cached
>> location for .META.,,1 is 10.129.68.212:60020
>>
>> and scan of '.META.' revealed that those regions, such as the
>> 'TestTable,0015411453,99999999999999', had splitA & splitB portions in META.
>>
>> <<< Then its a dropped edit of .META.  The parent region -- the one being
>> split got updates which added the splitA and splitB and its offlining but
>> the new region didn't get inserted?  The crash happened just at that
>> time?>>>
>>
>> Didn't fully understand the above.
>>
>> <<< It would be interesting to take a look at the regionserver logs from
>> that time.  Please post if you have a moment so we can take a looksee.>>>
>>
>> Will do. Should I just send it as an attachment to the list? Or is their a
>> recommended way of doing this?
>>
>> regards,
>> Kannan
>> ________________________________________
>> From: saint.ack@gmail.com [saint.ack@gmail.com] On Behalf Of Stack [
>> stack@duboce.net]
>> Sent: Monday, January 25, 2010 8:59 PM
>> To: hbase-dev@hadoop.apache.org
>> Subject: Re: scanner lease expired/region server shutdown
>>
>> What J-D said but with ornamentation.  See below.
>>
>> On Mon, Jan 25, 2010 at 7:14 PM, Kannan Muthukkaruppan
>> <Ka...@facebook.com> wrote:
>> > I was doing some brute force testing - running one instance of
>> PerformanceEvaluation (PE) for writes, and another instance for randomReads.
>> >
>>
>> Whats the cluster your hitting like?  That 5-node thingy?  Whats the
>> hardware profile.
>>
>>
>> > One of the region servers went down after a while. [This is on 0.20.2].
>> The region server log had things like:
>> >
>> > 2010-01-25 11:33:39,416 WARN org.apache.hadoop.hbase.regionserver.HLog:
>> IPC Server handler 34 on 60020 took 65190ms appending an edit to hlog;
>> editcount=27878
>>
>> This is saying that it took 65 seconds to append to hdfs.  What was
>> going on at that time?  A fat GC in the regionserver or over in a
>> Datanode?  You can enable GC logging uncommenting stuff in the
>> hbase-env.sh.   Feed the GC log to https://gchisto.dev.java.net/
>> (suggested by the zookeeper lads).  Its good for finding the long
>> pauses.  We should find the logs around the long GC pause.  Its
>> probably a failed promotion that brought on the stop-the-world GC.  Or
>> your HDFS was struggling?
>>
>> I was going to say that the PE puts the worse kind of loading on the
>> hbase cache -- nothing sticks around -- but looking at your numbers
>> below, cache seems to be working pretty well.
>>
>> > 2010-01-25 11:33:39,416 DEBUG
>> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
>> Total=1815.4441MB (1903631104), Free=581.23096MB (609464832),\
>> >  Max=2396.675MB (2513095936), Counts: Blocks=28216, Access=10904689,
>> Hit=9313612, Miss=1591077, Evictions=110, Evicted=414336, Ratios: Hit
>> Ratio=85.40924191474\
>> > 915%, Miss Ratio=14.590759575366974%, Evicted/Run=3766.69091796875
>> > 2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog:
>> IPC Server handler 52 on 60020 took 65195ms appending an edit to hlog;
>> editcount=27879
>> > 2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog:
>> IPC Server handler 75 on 60020 took 65195ms appending an edit to hlog;
>> editcount=27880
>> > 2010-01-25 11:33:39,417 INFO
>> org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
>> -5220035164735014996 lease expired
>> >
>> > This was followed by:
>> > 2010-01-25 11:33:39,636 WARN org.apache.zookeeper.ClientCnxn: Exception
>> closing session 0x12652c620520001 to sun.nio.ch.SelectionKeyImpl@788ab708
>> <ma...@788ab708>
>> > java.io.IOException: TIMED OUT
>>
>> This is the zk timeout.
>>
>> >
>> > And also errors like:
>> > 2010-01-25 11:33:39,692 ERROR
>> org.apache.hadoop.hbase.regionserver.HRegionServer:
>> > org.apache.hadoop.hbase.UnknownScannerException: Name:
>> -5220035164735014996
>> >
>>
>> These will happen after the above.  The regionserver is on its way
>> down.  Probably emptied the list of outstanding regions.
>>
>>
>> > Subsequently, I restarted the region server and just started the PE in
>> sequentialWrite mode. It seems to be off to a really slow start. For several
>> regions, it is printing the following:
>> >
>> > 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers:
>> locateRegionInMeta attempt 0 of 10 failed; retrying after sleep of 1000
>> because: region offline: TestTable,0031841132,1264121653303
>> > 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Removed
>> .META.,,1 for tableName=.META. from cache because of
>> TestTable,0032000000,99999999999999
>> > 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Cached
>> location for .META.,,1 is 10.129.68.212:60020
>> > 10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers:
>> locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000
>> because: region offline: TestTable,0031841132,1264121653303
>> > 10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers: Removed
>> .META.,,1 for tableName=.META. from cache because of
>> TestTable,0032000000,99999999999999
>> > [..and so on..]
>> > 10/01/25 18:12:59 DEBUG client.HConnectionManager$TableServers: Cached
>> location for .META.,,1 is 10.129.68.212:60020
>> > 10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers:
>> locateRegionInMeta attempt 8 of 10 failed; retrying after sleep of 16000
>> because: region offline: TestTable,0031841132,1264121653303
>> > 10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers: Removed
>> .META.,,1 for tableName=.META. from cache because of
>> TestTable,0032000000,99999999999999
>> >
>> > When I scan the '.META.', I noticed that most of the regions for which I
>> get the above error were in the middle of (?) a split. For example:
>> >
>> > TestTable,0031841132,126412 column=info:regioninfo,
>> timestamp=1264470099796, value=REGION => {NAME => 'TestTa
>> >  1653303                     ble,0031841132,1264121653303', STARTKEY =>
>> '0031841132', ENDKEY => '0032057774',
>> >                             ENCODED => 644500792, OFFLINE => true, SPLIT
>> => true, TABLE => {{NAME => 'TestTab
>> >                             le', FAMILIES => [{NAME => 'info', VERSIONS
>> => '3', COMPRESSION => 'NONE', TTL =>
>> >                              '2147483647', BLOCKSIZE => '65536',
>> IN_MEMORY => 'false', BLOCKCACHE => 'true'}]
>> >                             }}
>> >  TestTable,0031841132,126412 column=info:server, timestamp=1264121655148,
>> value=10.129.68.214:60020
>> >  1653303
>> >  TestTable,0031841132,126412 column=info:serverstartcode,
>> timestamp=1264121655148, value=1264109117245
>> >  1653303
>> >  TestTable,0031841132,126412 column=info:splitA, timestamp=1264470099796,
>> value=\000\n0031978284\000\000\000\0
>> >  1653303
>> 01&g\016\357\277\275\024\"TestTable,0031841132,1264449401108\000\n0031841132\000\
>> >
>> 000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0
>> >
>> 05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0
>> >
>> 00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0
>> >
>> 00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\
>> >
>> 000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M
>> >
>> EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004true\357\277\27
>> >                             5\337\274#
>> >  TestTable,0031841132,126412 column=info:splitB, timestamp=1264470099796,
>> value=\000\n0032057774\000\000\000\0
>> >  1653303
>> 01&g\016\357\277\275\024\"TestTable,0031978284,1264449401108\000\n0031978284\000\
>> >
>> 000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0
>> >
>> 05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0
>> >
>> 00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0
>> >
>> 00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\
>> >
>> 000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M
>> >
>> EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004truej\357\277\2
>> >                             75\312\262
>> >
>> > Any thoughts/ideas on what might be going on? Appreciate any input in
>> helping understand what might be going on.
>> >
>>
>> There is no line after the above?  A region with a startkey of
>> 0031841132?  Then its a dropped edit of .META.  The parent region --
>> the one being split got updates which added the splitA and splitB and
>> its offlining but the new region didn't get inserted?  The crash
>> happened just at that time?
>>
>> It would be interesting to take a look at the regionserver logs from
>> that time.  Please post if you have a moment so we can take a looksee.
>>
>> Above kinda thing is what the master rewrite is about moving state
>> transitions up to zk so atomic over cluster moving regions through
>> transitions rather than as here, a multi-row update might not all go
>> through as things currently work.
>>
>> St.Ack
>>
>> > Regards,
>> > Kannan
>> >
>> >
>> >
>>
>
>
>
> --
> Connect to me at http://www.facebook.com/dhruba
>

Re: scanner lease expired/region server shutdown

Posted by Dhruba Borthakur <dh...@gmail.com>.

This exception is from the DataNode, right? This means that the datanode
process has 32K files open simultaneously, how can that be? For each block
read/write the datanode has two open files, one for the data and one for the
.meta where the crc gets stored.

On the other hand, the datanode is configured via dfs.datanode.max.xcievers
to support 2048 read/write request simultanously, right?

thanks,
dhruba


On Tue, Jan 26, 2010 at 7:10 AM, Kannan Muthukkaruppan
<Ka...@facebook.com>wrote:

>
> Looking further up in the logs (about 20 minutes prior in the logs when
> errors first started happening), I noticed the following.
>
> btw, ulimit -a shows that I have "open files" set to 64k. Is that not
> sufficient?
>
> 2010-01-25 11:10:21,774 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 10.129.68.212:50010,
> storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075,
> ipcPort=50020):Data\
> XceiveServer: java.io.IOException: Too many open files
>        at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>        at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:145)
>        at
> sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84)
>        at
> org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:130)
>         at java.lang.Thread.run(Thread.java:619)
>
> 2010-01-25 11:10:21,566 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 10.129.68.212:50010,
> storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075,
> ipcPort=50020):Got \
> exception while serving blk_3332344970774019423_10249 to /10.129.68.212:
> java.io.FileNotFoundException:
> /mnt/d1/HDFS-kannan1/current/subdir23/blk_3332344970774019423_10249.meta
> (Too many open files)
>        at java.io.FileInputStream.open(Native Method)
>        at java.io.FileInputStream.<init>(FileInputStream.java:106)
>        at
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getMetaDataInputStream(FSDataset.java:682)
>        at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:97)
>        at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
>        at java.lang.Thread.run(Thread.java:619)
>
>
>
> ________________________________________
> From: Kannan Muthukkaruppan [Kannan@facebook.com]
> Sent: Tuesday, January 26, 2010 7:01 AM
> To: hbase-dev@hadoop.apache.org
> Subject: RE: scanner lease expired/region server shutdown
>
> 1. Yes, it is a 5 node, setup.
>
> 1 Name Node/4 Data Nodes. Of the 4 DN, one is running the HBase Master, and
> the other three are running region servers. ZK is on all the same 5 nodes.
> Should ideally have separated this out. The nodes are 16GB, 4 disk machines.
>
> 2. I examined the HDFS datanode log on the same machine around that time
> the problems happened, and saw this:
>
> 2010-01-25 11:33:09,531 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 10.129.68.212:50010,
> storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075,
> ipcPort=\
> 50020):Got exception while serving blk_5691809099673541164_10475 to /
> 10.129.68.212:
> java.net.SocketTimeoutException: 480000 millis timeout while waiting for
> channel to be ready for write. ch :
> java.nio.channels.SocketChannel[connected local=/10.129.68.212:50010remote=/
> 10.129.68.212:477\
> 29]
>        at
> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
>        at
> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
>        at
> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
>        at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313)
>        at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400)
>        at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180)
>        at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
>        at java.lang.Thread.run(Thread.java:619)
>
> [For future runs, will try to turn on Java GC logging as well as per your
> suggestion.]
>
> 3.  You wrote <<< There is no line after the above?  A region with a
> startkey of 0031841132?  >>>
>
> I just cut-paste a section of the PE log and the scan of the '.META.' from
> shell. But there were many such regions for which the PE client reported
> errors of the form:
>
> 10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers:
> locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000
> because: Connection refused
> 10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers: Removed
> .META.,,1 for tableName=.META. from cache because of
> TestTable,0015411453,99999999999999
> 10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers: Cached
> location for .META.,,1 is 10.129.68.212:60020
>
> and scan of '.META.' revealed that those regions, such as the
> 'TestTable,0015411453,99999999999999', had splitA & splitB portions in META.
>
> <<< Then its a dropped edit of .META.  The parent region -- the one being
> split got updates which added the splitA and splitB and its offlining but
> the new region didn't get inserted?  The crash happened just at that
> time?>>>
>
> Didn't fully understand the above.
>
> <<< It would be interesting to take a look at the regionserver logs from
> that time.  Please post if you have a moment so we can take a looksee.>>>
>
> Will do. Should I just send it as an attachment to the list? Or is their a
> recommended way of doing this?
>
> regards,
> Kannan
> ________________________________________
> From: saint.ack@gmail.com [saint.ack@gmail.com] On Behalf Of Stack [
> stack@duboce.net]
> Sent: Monday, January 25, 2010 8:59 PM
> To: hbase-dev@hadoop.apache.org
> Subject: Re: scanner lease expired/region server shutdown
>
> What J-D said but with ornamentation.  See below.
>
> On Mon, Jan 25, 2010 at 7:14 PM, Kannan Muthukkaruppan
> <Ka...@facebook.com> wrote:
> > I was doing some brute force testing - running one instance of
> PerformanceEvaluation (PE) for writes, and another instance for randomReads.
> >
>
> Whats the cluster your hitting like?  That 5-node thingy?  Whats the
> hardware profile.
>
>
> > One of the region servers went down after a while. [This is on 0.20.2].
> The region server log had things like:
> >
> > 2010-01-25 11:33:39,416 WARN org.apache.hadoop.hbase.regionserver.HLog:
> IPC Server handler 34 on 60020 took 65190ms appending an edit to hlog;
> editcount=27878
>
> This is saying that it took 65 seconds to append to hdfs.  What was
> going on at that time?  A fat GC in the regionserver or over in a
> Datanode?  You can enable GC logging uncommenting stuff in the
> hbase-env.sh.   Feed the GC log to https://gchisto.dev.java.net/
> (suggested by the zookeeper lads).  Its good for finding the long
> pauses.  We should find the logs around the long GC pause.  Its
> probably a failed promotion that brought on the stop-the-world GC.  Or
> your HDFS was struggling?
>
> I was going to say that the PE puts the worse kind of loading on the
> hbase cache -- nothing sticks around -- but looking at your numbers
> below, cache seems to be working pretty well.
>
> > 2010-01-25 11:33:39,416 DEBUG
> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
> Total=1815.4441MB (1903631104), Free=581.23096MB (609464832),\
> >  Max=2396.675MB (2513095936), Counts: Blocks=28216, Access=10904689,
> Hit=9313612, Miss=1591077, Evictions=110, Evicted=414336, Ratios: Hit
> Ratio=85.40924191474\
> > 915%, Miss Ratio=14.590759575366974%, Evicted/Run=3766.69091796875
> > 2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog:
> IPC Server handler 52 on 60020 took 65195ms appending an edit to hlog;
> editcount=27879
> > 2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog:
> IPC Server handler 75 on 60020 took 65195ms appending an edit to hlog;
> editcount=27880
> > 2010-01-25 11:33:39,417 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> -5220035164735014996 lease expired
> >
> > This was followed by:
> > 2010-01-25 11:33:39,636 WARN org.apache.zookeeper.ClientCnxn: Exception
> closing session 0x12652c620520001 to sun.nio.ch.SelectionKeyImpl@788ab708
> <ma...@788ab708>
> > java.io.IOException: TIMED OUT
>
> This is the zk timeout.
>
> >
> > And also errors like:
> > 2010-01-25 11:33:39,692 ERROR
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> > org.apache.hadoop.hbase.UnknownScannerException: Name:
> -5220035164735014996
> >
>
> These will happen after the above.  The regionserver is on its way
> down.  Probably emptied the list of outstanding regions.
>
>
> > Subsequently, I restarted the region server and just started the PE in
> sequentialWrite mode. It seems to be off to a really slow start. For several
> regions, it is printing the following:
> >
> > 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers:
> locateRegionInMeta attempt 0 of 10 failed; retrying after sleep of 1000
> because: region offline: TestTable,0031841132,1264121653303
> > 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Removed
> .META.,,1 for tableName=.META. from cache because of
> TestTable,0032000000,99999999999999
> > 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Cached
> location for .META.,,1 is 10.129.68.212:60020
> > 10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers:
> locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000
> because: region offline: TestTable,0031841132,1264121653303
> > 10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers: Removed
> .META.,,1 for tableName=.META. from cache because of
> TestTable,0032000000,99999999999999
> > [..and so on..]
> > 10/01/25 18:12:59 DEBUG client.HConnectionManager$TableServers: Cached
> location for .META.,,1 is 10.129.68.212:60020
> > 10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers:
> locateRegionInMeta attempt 8 of 10 failed; retrying after sleep of 16000
> because: region offline: TestTable,0031841132,1264121653303
> > 10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers: Removed
> .META.,,1 for tableName=.META. from cache because of
> TestTable,0032000000,99999999999999
> >
> > When I scan the '.META.', I noticed that most of the regions for which I
> get the above error were in the middle of (?) a split. For example:
> >
> > TestTable,0031841132,126412 column=info:regioninfo,
> timestamp=1264470099796, value=REGION => {NAME => 'TestTa
> >  1653303                     ble,0031841132,1264121653303', STARTKEY =>
> '0031841132', ENDKEY => '0032057774',
> >                             ENCODED => 644500792, OFFLINE => true, SPLIT
> => true, TABLE => {{NAME => 'TestTab
> >                             le', FAMILIES => [{NAME => 'info', VERSIONS
> => '3', COMPRESSION => 'NONE', TTL =>
> >                              '2147483647', BLOCKSIZE => '65536',
> IN_MEMORY => 'false', BLOCKCACHE => 'true'}]
> >                             }}
> >  TestTable,0031841132,126412 column=info:server, timestamp=1264121655148,
> value=10.129.68.214:60020
> >  1653303
> >  TestTable,0031841132,126412 column=info:serverstartcode,
> timestamp=1264121655148, value=1264109117245
> >  1653303
> >  TestTable,0031841132,126412 column=info:splitA, timestamp=1264470099796,
> value=\000\n0031978284\000\000\000\0
> >  1653303
> 01&g\016\357\277\275\024\"TestTable,0031841132,1264449401108\000\n0031841132\000\
> >
> 000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0
> >
> 05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0
> >
> 00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0
> >
> 00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\
> >
> 000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M
> >
> EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004true\357\277\27
> >                             5\337\274#
> >  TestTable,0031841132,126412 column=info:splitB, timestamp=1264470099796,
> value=\000\n0032057774\000\000\000\0
> >  1653303
> 01&g\016\357\277\275\024\"TestTable,0031978284,1264449401108\000\n0031978284\000\
> >
> 000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0
> >
> 05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0
> >
> 00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0
> >
> 00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\
> >
> 000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M
> >
> EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004truej\357\277\2
> >                             75\312\262
> >
> > Any thoughts/ideas on what might be going on? Appreciate any input in
> helping understand what might be going on.
> >
>
> There is no line after the above?  A region with a startkey of
> 0031841132?  Then its a dropped edit of .META.  The parent region --
> the one being split got updates which added the splitA and splitB and
> its offlining but the new region didn't get inserted?  The crash
> happened just at that time?
>
> It would be interesting to take a look at the regionserver logs from
> that time.  Please post if you have a moment so we can take a looksee.
>
> Above kinda thing is what the master rewrite is about moving state
> transitions up to zk so atomic over cluster moving regions through
> transitions rather than as here, a multi-row update might not all go
> through as things currently work.
>
> St.Ack
>
> > Regards,
> > Kannan
> >
> >
> >
>



-- 
Connect to me at http://www.facebook.com/dhruba

Re: scanner lease expired/region server shutdown

Posted by Stack <st...@duboce.net>.

On Tue, Jan 26, 2010 at 12:56 PM, Kannan Muthukkaruppan
<Ka...@facebook.com> wrote:

> 2) region server log does report ulimit of 64k.
>

Good.  Then, we need to figure why its not effective at least for DNs
-- something like a kernel issue as per the J-D pointer.


> 3) An aside. Still using -XX:+CMSIncrementalMode. But I recall reading that for machines with more than 2 cores, the recommendation is to not use the incremental mode.

Yes.  Here is paragraph we quote:
http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html#icms.
 My guess is that its probably just ignored on a server-class machine
like yours.

You fellas probably have a bunch of GC experts in house.  Grab one and
show them your logs or post here and we can take a looksee.

St.Ack


>
> -----Original Message-----
> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
> Sent: Tuesday, January 26, 2010 8:32 AM
> To: hbase-dev@hadoop.apache.org
> Subject: Re: scanner lease expired/region server shutdown
>
> This is suspicious.  Make sure that second line in master or
> regionserver log where it prints out ulimit for the process has 64k
> (and not 1024).  You may not have set it for the DN user or setting it
> on your linux may be ornery (See
> http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A6 for note about
> red hat based systems have upper bound for system, or here, where an
> interesting issue around kernel limits:
> http://wiki.apache.org/hadoop/Hbase/FAQ#A6).
> St.Ack
>
> On Tue, Jan 26, 2010 at 7:10 AM, Kannan Muthukkaruppan
> <Ka...@facebook.com> wrote:
>>
>> Looking further up in the logs (about 20 minutes prior in the logs when errors first started happening), I noticed the following.
>>
>> btw, ulimit -a shows that I have "open files" set to 64k. Is that not sufficient?
>>
>> 2010-01-25 11:10:21,774 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.129.68.212:50010, storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075, ipcPort=50020):Data\
>> XceiveServer: java.io.IOException: Too many open files
>>        at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>>        at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:145)
>>        at sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84)
>>        at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:130)
>>        at java.lang.Thread.run(Thread.java:619)
>>
>> 2010-01-25 11:10:21,566 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.129.68.212:50010, storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075, ipcPort=50020):Got \
>> exception while serving blk_3332344970774019423_10249 to /10.129.68.212:
>> java.io.FileNotFoundException: /mnt/d1/HDFS-kannan1/current/subdir23/blk_3332344970774019423_10249.meta (Too many open files)
>>        at java.io.FileInputStream.open(Native Method)
>>        at java.io.FileInputStream.<init>(FileInputStream.java:106)
>>        at org.apache.hadoop.hdfs.server.datanode.FSDataset.getMetaDataInputStream(FSDataset.java:682)
>>        at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:97)
>>        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
>>        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
>>        at java.lang.Thread.run(Thread.java:619)
>>
>>
>>
>> ________________________________________
>> From: Kannan Muthukkaruppan [Kannan@facebook.com]
>> Sent: Tuesday, January 26, 2010 7:01 AM
>> To: hbase-dev@hadoop.apache.org
>> Subject: RE: scanner lease expired/region server shutdown
>>
>> 1. Yes, it is a 5 node, setup.
>>
>> 1 Name Node/4 Data Nodes. Of the 4 DN, one is running the HBase Master, and the other three are running region servers. ZK is on all the same 5 nodes. Should ideally have separated this out. The nodes are 16GB, 4 disk machines.
>>
>> 2. I examined the HDFS datanode log on the same machine around that time the problems happened, and saw this:
>>
>> 2010-01-25 11:33:09,531 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.129.68.212:50010, storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075, ipcPort=\
>> 50020):Got exception while serving blk_5691809099673541164_10475 to /10.129.68.212:
>> java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.129.68.212:50010 remote=/10.129.68.212:477\
>> 29]
>>        at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
>>        at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
>>        at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
>>        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313)
>>        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400)
>>        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180)
>>        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
>>        at java.lang.Thread.run(Thread.java:619)
>>
>> [For future runs, will try to turn on Java GC logging as well as per your suggestion.]
>>
>> 3.  You wrote <<< There is no line after the above?  A region with a startkey of 0031841132?  >>>
>>
>> I just cut-paste a section of the PE log and the scan of the '.META.' from shell. But there were many such regions for which the PE client reported errors of the form:
>>
>> 10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000 because: Connection refused
>> 10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0015411453,99999999999999
>> 10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers: Cached location for .META.,,1 is 10.129.68.212:60020
>>
>> and scan of '.META.' revealed that those regions, such as the 'TestTable,0015411453,99999999999999', had splitA & splitB portions in META.
>>
>> <<< Then its a dropped edit of .META.  The parent region -- the one being split got updates which added the splitA and splitB and its offlining but the new region didn't get inserted?  The crash happened just at that time?>>>
>>
>> Didn't fully understand the above.
>>
>> <<< It would be interesting to take a look at the regionserver logs from that time.  Please post if you have a moment so we can take a looksee.>>>
>>
>> Will do. Should I just send it as an attachment to the list? Or is their a recommended way of doing this?
>>
>> regards,
>> Kannan
>> ________________________________________
>> From: saint.ack@gmail.com [saint.ack@gmail.com] On Behalf Of Stack [stack@duboce.net]
>> Sent: Monday, January 25, 2010 8:59 PM
>> To: hbase-dev@hadoop.apache.org
>> Subject: Re: scanner lease expired/region server shutdown
>>
>> What J-D said but with ornamentation.  See below.
>>
>> On Mon, Jan 25, 2010 at 7:14 PM, Kannan Muthukkaruppan
>> <Ka...@facebook.com> wrote:
>>> I was doing some brute force testing - running one instance of PerformanceEvaluation (PE) for writes, and another instance for randomReads.
>>>
>>
>> Whats the cluster your hitting like?  That 5-node thingy?  Whats the
>> hardware profile.
>>
>>
>>> One of the region servers went down after a while. [This is on 0.20.2]. The region server log had things like:
>>>
>>> 2010-01-25 11:33:39,416 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 34 on 60020 took 65190ms appending an edit to hlog; editcount=27878
>>
>> This is saying that it took 65 seconds to append to hdfs.  What was
>> going on at that time?  A fat GC in the regionserver or over in a
>> Datanode?  You can enable GC logging uncommenting stuff in the
>> hbase-env.sh.   Feed the GC log to https://gchisto.dev.java.net/
>> (suggested by the zookeeper lads).  Its good for finding the long
>> pauses.  We should find the logs around the long GC pause.  Its
>> probably a failed promotion that brought on the stop-the-world GC.  Or
>> your HDFS was struggling?
>>
>> I was going to say that the PE puts the worse kind of loading on the
>> hbase cache -- nothing sticks around -- but looking at your numbers
>> below, cache seems to be working pretty well.
>>
>>> 2010-01-25 11:33:39,416 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=1815.4441MB (1903631104), Free=581.23096MB (609464832),\
>>>  Max=2396.675MB (2513095936), Counts: Blocks=28216, Access=10904689, Hit=9313612, Miss=1591077, Evictions=110, Evicted=414336, Ratios: Hit Ratio=85.40924191474\
>>> 915%, Miss Ratio=14.590759575366974%, Evicted/Run=3766.69091796875
>>> 2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 52 on 60020 took 65195ms appending an edit to hlog; editcount=27879
>>> 2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 75 on 60020 took 65195ms appending an edit to hlog; editcount=27880
>>> 2010-01-25 11:33:39,417 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner -5220035164735014996 lease expired
>>>
>>> This was followed by:
>>> 2010-01-25 11:33:39,636 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x12652c620520001 to sun.nio.ch.SelectionKeyImpl@788ab708<ma...@788ab708>
>>> java.io.IOException: TIMED OUT
>>
>> This is the zk timeout.
>>
>>>
>>> And also errors like:
>>> 2010-01-25 11:33:39,692 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:
>>> org.apache.hadoop.hbase.UnknownScannerException: Name: -5220035164735014996
>>>
>>
>> These will happen after the above.  The regionserver is on its way
>> down.  Probably emptied the list of outstanding regions.
>>
>>
>>> Subsequently, I restarted the region server and just started the PE in sequentialWrite mode. It seems to be off to a really slow start. For several regions, it is printing the following:
>>>
>>> 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta attempt 0 of 10 failed; retrying after sleep of 1000 because: region offline: TestTable,0031841132,1264121653303
>>> 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0032000000,99999999999999
>>> 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Cached location for .META.,,1 is 10.129.68.212:60020
>>> 10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000 because: region offline: TestTable,0031841132,1264121653303
>>> 10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0032000000,99999999999999
>>> [..and so on..]
>>> 10/01/25 18:12:59 DEBUG client.HConnectionManager$TableServers: Cached location for .META.,,1 is 10.129.68.212:60020
>>> 10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta attempt 8 of 10 failed; retrying after sleep of 16000 because: region offline: TestTable,0031841132,1264121653303
>>> 10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0032000000,99999999999999
>>>
>>> When I scan the '.META.', I noticed that most of the regions for which I get the above error were in the middle of (?) a split. For example:
>>>
>>> TestTable,0031841132,126412 column=info:regioninfo, timestamp=1264470099796, value=REGION => {NAME => 'TestTa
>>>  1653303                     ble,0031841132,1264121653303', STARTKEY => '0031841132', ENDKEY => '0032057774',
>>>                             ENCODED => 644500792, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'TestTab
>>>                             le', FAMILIES => [{NAME => 'info', VERSIONS => '3', COMPRESSION => 'NONE', TTL =>
>>>                              '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]
>>>                             }}
>>>  TestTable,0031841132,126412 column=info:server, timestamp=1264121655148, value=10.129.68.214:60020
>>>  1653303
>>>  TestTable,0031841132,126412 column=info:serverstartcode, timestamp=1264121655148, value=1264109117245
>>>  1653303
>>>  TestTable,0031841132,126412 column=info:splitA, timestamp=1264470099796, value=\000\n0031978284\000\000\000\0
>>>  1653303                     01&g\016\357\277\275\024\"TestTable,0031841132,1264449401108\000\n0031841132\000\
>>>                             000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0
>>>                             05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0
>>>                             00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0
>>>                             00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\
>>>                             000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M
>>>                             EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004true\357\277\27
>>>                             5\337\274#
>>>  TestTable,0031841132,126412 column=info:splitB, timestamp=1264470099796, value=\000\n0032057774\000\000\000\0
>>>  1653303                     01&g\016\357\277\275\024\"TestTable,0031978284,1264449401108\000\n0031978284\000\
>>>                             000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0
>>>                             05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0
>>>                             00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0
>>>                             00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\
>>>                             000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M
>>>                             EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004truej\357\277\2
>>>                             75\312\262
>>>
>>> Any thoughts/ideas on what might be going on? Appreciate any input in helping understand what might be going on.
>>>
>>
>> There is no line after the above?  A region with a startkey of
>> 0031841132?  Then its a dropped edit of .META.  The parent region --
>> the one being split got updates which added the splitA and splitB and
>> its offlining but the new region didn't get inserted?  The crash
>> happened just at that time?
>>
>> It would be interesting to take a look at the regionserver logs from
>> that time.  Please post if you have a moment so we can take a looksee.
>>
>> Above kinda thing is what the master rewrite is about moving state
>> transitions up to zk so atomic over cluster moving regions through
>> transitions rather than as here, a multi-row update might not all go
>> through as things currently work.
>>
>> St.Ack
>>
>>> Regards,
>>> Kannan
>>>
>>>
>>>
>>
>

RE: scanner lease expired/region server shutdown

Posted by Kannan Muthukkaruppan <Ka...@facebook.com>.

Some quick answers to the questions. Will follow up later today with more details.

1) the test nodes were 8 cpu machines.

2) region server log does report ulimit of 64k.

Mon Jan 25 17:41:22 PST 2010 Starting regionserver on titantest013.ash1.facebook.com
ulimit -n 65535
2010-01-25 17:41:23,605 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: vmInputArguments=[-Xmx12000m, -XX:+HeapDumpOnOutOfMemoryError, -XX:+UseConcMarkSweepGC, -XX:+CMSIncrementalMode, -Dhbase.log.dir=/usr/local/hadoop/logs/HBASE-kannan1, -Dhbase.log.file=hbase-hadoop-regionserver-tit\
antest013.ash1.facebook.com.log, -Dhbase.home.dir=/usr/local/hadoop/HBASE-kannan1/bin/.., -Dhbase.id.str=hadoop, -Dhbase.root.logger=INFO,DRFA, -Djava.library.path=/usr/local/hadoop/HBASE-kannan1/bin/../lib/native/Linux-amd64-64]

3) An aside. Still using -XX:+CMSIncrementalMode. But I recall reading that for machines with more than 2 cores, the recommendation is to not use the incremental mode.

-----Original Message-----
From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
Sent: Tuesday, January 26, 2010 8:32 AM
To: hbase-dev@hadoop.apache.org
Subject: Re: scanner lease expired/region server shutdown

This is suspicious.  Make sure that second line in master or
regionserver log where it prints out ulimit for the process has 64k
(and not 1024).  You may not have set it for the DN user or setting it
on your linux may be ornery (See
http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A6 for note about
red hat based systems have upper bound for system, or here, where an
interesting issue around kernel limits:
http://wiki.apache.org/hadoop/Hbase/FAQ#A6).
St.Ack

On Tue, Jan 26, 2010 at 7:10 AM, Kannan Muthukkaruppan
<Ka...@facebook.com> wrote:
>
> Looking further up in the logs (about 20 minutes prior in the logs when errors first started happening), I noticed the following.
>
> btw, ulimit -a shows that I have "open files" set to 64k. Is that not sufficient?
>
> 2010-01-25 11:10:21,774 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.129.68.212:50010, storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075, ipcPort=50020):Data\
> XceiveServer: java.io.IOException: Too many open files
>        at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>        at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:145)
>        at sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84)
>        at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:130)
>        at java.lang.Thread.run(Thread.java:619)
>
> 2010-01-25 11:10:21,566 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.129.68.212:50010, storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075, ipcPort=50020):Got \
> exception while serving blk_3332344970774019423_10249 to /10.129.68.212:
> java.io.FileNotFoundException: /mnt/d1/HDFS-kannan1/current/subdir23/blk_3332344970774019423_10249.meta (Too many open files)
>        at java.io.FileInputStream.open(Native Method)
>        at java.io.FileInputStream.<init>(FileInputStream.java:106)
>        at org.apache.hadoop.hdfs.server.datanode.FSDataset.getMetaDataInputStream(FSDataset.java:682)
>        at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:97)
>        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
>        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
>        at java.lang.Thread.run(Thread.java:619)
>
>
>
> ________________________________________
> From: Kannan Muthukkaruppan [Kannan@facebook.com]
> Sent: Tuesday, January 26, 2010 7:01 AM
> To: hbase-dev@hadoop.apache.org
> Subject: RE: scanner lease expired/region server shutdown
>
> 1. Yes, it is a 5 node, setup.
>
> 1 Name Node/4 Data Nodes. Of the 4 DN, one is running the HBase Master, and the other three are running region servers. ZK is on all the same 5 nodes. Should ideally have separated this out. The nodes are 16GB, 4 disk machines.
>
> 2. I examined the HDFS datanode log on the same machine around that time the problems happened, and saw this:
>
> 2010-01-25 11:33:09,531 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.129.68.212:50010, storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075, ipcPort=\
> 50020):Got exception while serving blk_5691809099673541164_10475 to /10.129.68.212:
> java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.129.68.212:50010 remote=/10.129.68.212:477\
> 29]
>        at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
>        at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
>        at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
>        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313)
>        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400)
>        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180)
>        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
>        at java.lang.Thread.run(Thread.java:619)
>
> [For future runs, will try to turn on Java GC logging as well as per your suggestion.]
>
> 3.  You wrote <<< There is no line after the above?  A region with a startkey of 0031841132?  >>>
>
> I just cut-paste a section of the PE log and the scan of the '.META.' from shell. But there were many such regions for which the PE client reported errors of the form:
>
> 10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000 because: Connection refused
> 10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0015411453,99999999999999
> 10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers: Cached location for .META.,,1 is 10.129.68.212:60020
>
> and scan of '.META.' revealed that those regions, such as the 'TestTable,0015411453,99999999999999', had splitA & splitB portions in META.
>
> <<< Then its a dropped edit of .META.  The parent region -- the one being split got updates which added the splitA and splitB and its offlining but the new region didn't get inserted?  The crash happened just at that time?>>>
>
> Didn't fully understand the above.
>
> <<< It would be interesting to take a look at the regionserver logs from that time.  Please post if you have a moment so we can take a looksee.>>>
>
> Will do. Should I just send it as an attachment to the list? Or is their a recommended way of doing this?
>
> regards,
> Kannan
> ________________________________________
> From: saint.ack@gmail.com [saint.ack@gmail.com] On Behalf Of Stack [stack@duboce.net]
> Sent: Monday, January 25, 2010 8:59 PM
> To: hbase-dev@hadoop.apache.org
> Subject: Re: scanner lease expired/region server shutdown
>
> What J-D said but with ornamentation.  See below.
>
> On Mon, Jan 25, 2010 at 7:14 PM, Kannan Muthukkaruppan
> <Ka...@facebook.com> wrote:
>> I was doing some brute force testing - running one instance of PerformanceEvaluation (PE) for writes, and another instance for randomReads.
>>
>
> Whats the cluster your hitting like?  That 5-node thingy?  Whats the
> hardware profile.
>
>
>> One of the region servers went down after a while. [This is on 0.20.2]. The region server log had things like:
>>
>> 2010-01-25 11:33:39,416 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 34 on 60020 took 65190ms appending an edit to hlog; editcount=27878
>
> This is saying that it took 65 seconds to append to hdfs.  What was
> going on at that time?  A fat GC in the regionserver or over in a
> Datanode?  You can enable GC logging uncommenting stuff in the
> hbase-env.sh.   Feed the GC log to https://gchisto.dev.java.net/
> (suggested by the zookeeper lads).  Its good for finding the long
> pauses.  We should find the logs around the long GC pause.  Its
> probably a failed promotion that brought on the stop-the-world GC.  Or
> your HDFS was struggling?
>
> I was going to say that the PE puts the worse kind of loading on the
> hbase cache -- nothing sticks around -- but looking at your numbers
> below, cache seems to be working pretty well.
>
>> 2010-01-25 11:33:39,416 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=1815.4441MB (1903631104), Free=581.23096MB (609464832),\
>>  Max=2396.675MB (2513095936), Counts: Blocks=28216, Access=10904689, Hit=9313612, Miss=1591077, Evictions=110, Evicted=414336, Ratios: Hit Ratio=85.40924191474\
>> 915%, Miss Ratio=14.590759575366974%, Evicted/Run=3766.69091796875
>> 2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 52 on 60020 took 65195ms appending an edit to hlog; editcount=27879
>> 2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 75 on 60020 took 65195ms appending an edit to hlog; editcount=27880
>> 2010-01-25 11:33:39,417 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner -5220035164735014996 lease expired
>>
>> This was followed by:
>> 2010-01-25 11:33:39,636 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x12652c620520001 to sun.nio.ch.SelectionKeyImpl@788ab708<ma...@788ab708>
>> java.io.IOException: TIMED OUT
>
> This is the zk timeout.
>
>>
>> And also errors like:
>> 2010-01-25 11:33:39,692 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:
>> org.apache.hadoop.hbase.UnknownScannerException: Name: -5220035164735014996
>>
>
> These will happen after the above.  The regionserver is on its way
> down.  Probably emptied the list of outstanding regions.
>
>
>> Subsequently, I restarted the region server and just started the PE in sequentialWrite mode. It seems to be off to a really slow start. For several regions, it is printing the following:
>>
>> 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta attempt 0 of 10 failed; retrying after sleep of 1000 because: region offline: TestTable,0031841132,1264121653303
>> 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0032000000,99999999999999
>> 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Cached location for .META.,,1 is 10.129.68.212:60020
>> 10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000 because: region offline: TestTable,0031841132,1264121653303
>> 10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0032000000,99999999999999
>> [..and so on..]
>> 10/01/25 18:12:59 DEBUG client.HConnectionManager$TableServers: Cached location for .META.,,1 is 10.129.68.212:60020
>> 10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta attempt 8 of 10 failed; retrying after sleep of 16000 because: region offline: TestTable,0031841132,1264121653303
>> 10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0032000000,99999999999999
>>
>> When I scan the '.META.', I noticed that most of the regions for which I get the above error were in the middle of (?) a split. For example:
>>
>> TestTable,0031841132,126412 column=info:regioninfo, timestamp=1264470099796, value=REGION => {NAME => 'TestTa
>>  1653303                     ble,0031841132,1264121653303', STARTKEY => '0031841132', ENDKEY => '0032057774',
>>                             ENCODED => 644500792, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'TestTab
>>                             le', FAMILIES => [{NAME => 'info', VERSIONS => '3', COMPRESSION => 'NONE', TTL =>
>>                              '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]
>>                             }}
>>  TestTable,0031841132,126412 column=info:server, timestamp=1264121655148, value=10.129.68.214:60020
>>  1653303
>>  TestTable,0031841132,126412 column=info:serverstartcode, timestamp=1264121655148, value=1264109117245
>>  1653303
>>  TestTable,0031841132,126412 column=info:splitA, timestamp=1264470099796, value=\000\n0031978284\000\000\000\0
>>  1653303                     01&g\016\357\277\275\024\"TestTable,0031841132,1264449401108\000\n0031841132\000\
>>                             000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0
>>                             05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0
>>                             00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0
>>                             00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\
>>                             000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M
>>                             EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004true\357\277\27
>>                             5\337\274#
>>  TestTable,0031841132,126412 column=info:splitB, timestamp=1264470099796, value=\000\n0032057774\000\000\000\0
>>  1653303                     01&g\016\357\277\275\024\"TestTable,0031978284,1264449401108\000\n0031978284\000\
>>                             000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0
>>                             05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0
>>                             00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0
>>                             00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\
>>                             000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M
>>                             EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004truej\357\277\2
>>                             75\312\262
>>
>> Any thoughts/ideas on what might be going on? Appreciate any input in helping understand what might be going on.
>>
>
> There is no line after the above?  A region with a startkey of
> 0031841132?  Then its a dropped edit of .META.  The parent region --
> the one being split got updates which added the splitA and splitB and
> its offlining but the new region didn't get inserted?  The crash
> happened just at that time?
>
> It would be interesting to take a look at the regionserver logs from
> that time.  Please post if you have a moment so we can take a looksee.
>
> Above kinda thing is what the master rewrite is about moving state
> transitions up to zk so atomic over cluster moving regions through
> transitions rather than as here, a multi-row update might not all go
> through as things currently work.
>
> St.Ack
>
>> Regards,
>> Kannan
>>
>>
>>
>

Re: scanner lease expired/region server shutdown

Posted by Stack <st...@duboce.net>.

This is suspicious.  Make sure that second line in master or
regionserver log where it prints out ulimit for the process has 64k
(and not 1024).  You may not have set it for the DN user or setting it
on your linux may be ornery (See
http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A6 for note about
red hat based systems have upper bound for system, or here, where an
interesting issue around kernel limits:
http://wiki.apache.org/hadoop/Hbase/FAQ#A6).
St.Ack

On Tue, Jan 26, 2010 at 7:10 AM, Kannan Muthukkaruppan
<Ka...@facebook.com> wrote:
>
> Looking further up in the logs (about 20 minutes prior in the logs when errors first started happening), I noticed the following.
>
> btw, ulimit -a shows that I have "open files" set to 64k. Is that not sufficient?
>
> 2010-01-25 11:10:21,774 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.129.68.212:50010, storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075, ipcPort=50020):Data\
> XceiveServer: java.io.IOException: Too many open files
>        at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>        at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:145)
>        at sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84)
>        at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:130)
>        at java.lang.Thread.run(Thread.java:619)
>
> 2010-01-25 11:10:21,566 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.129.68.212:50010, storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075, ipcPort=50020):Got \
> exception while serving blk_3332344970774019423_10249 to /10.129.68.212:
> java.io.FileNotFoundException: /mnt/d1/HDFS-kannan1/current/subdir23/blk_3332344970774019423_10249.meta (Too many open files)
>        at java.io.FileInputStream.open(Native Method)
>        at java.io.FileInputStream.<init>(FileInputStream.java:106)
>        at org.apache.hadoop.hdfs.server.datanode.FSDataset.getMetaDataInputStream(FSDataset.java:682)
>        at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:97)
>        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
>        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
>        at java.lang.Thread.run(Thread.java:619)
>
>
>
> ________________________________________
> From: Kannan Muthukkaruppan [Kannan@facebook.com]
> Sent: Tuesday, January 26, 2010 7:01 AM
> To: hbase-dev@hadoop.apache.org
> Subject: RE: scanner lease expired/region server shutdown
>
> 1. Yes, it is a 5 node, setup.
>
> 1 Name Node/4 Data Nodes. Of the 4 DN, one is running the HBase Master, and the other three are running region servers. ZK is on all the same 5 nodes. Should ideally have separated this out. The nodes are 16GB, 4 disk machines.
>
> 2. I examined the HDFS datanode log on the same machine around that time the problems happened, and saw this:
>
> 2010-01-25 11:33:09,531 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.129.68.212:50010, storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075, ipcPort=\
> 50020):Got exception while serving blk_5691809099673541164_10475 to /10.129.68.212:
> java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.129.68.212:50010 remote=/10.129.68.212:477\
> 29]
>        at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
>        at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
>        at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
>        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313)
>        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400)
>        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180)
>        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
>        at java.lang.Thread.run(Thread.java:619)
>
> [For future runs, will try to turn on Java GC logging as well as per your suggestion.]
>
> 3.  You wrote <<< There is no line after the above?  A region with a startkey of 0031841132?  >>>
>
> I just cut-paste a section of the PE log and the scan of the '.META.' from shell. But there were many such regions for which the PE client reported errors of the form:
>
> 10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000 because: Connection refused
> 10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0015411453,99999999999999
> 10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers: Cached location for .META.,,1 is 10.129.68.212:60020
>
> and scan of '.META.' revealed that those regions, such as the 'TestTable,0015411453,99999999999999', had splitA & splitB portions in META.
>
> <<< Then its a dropped edit of .META.  The parent region -- the one being split got updates which added the splitA and splitB and its offlining but the new region didn't get inserted?  The crash happened just at that time?>>>
>
> Didn't fully understand the above.
>
> <<< It would be interesting to take a look at the regionserver logs from that time.  Please post if you have a moment so we can take a looksee.>>>
>
> Will do. Should I just send it as an attachment to the list? Or is their a recommended way of doing this?
>
> regards,
> Kannan
> ________________________________________
> From: saint.ack@gmail.com [saint.ack@gmail.com] On Behalf Of Stack [stack@duboce.net]
> Sent: Monday, January 25, 2010 8:59 PM
> To: hbase-dev@hadoop.apache.org
> Subject: Re: scanner lease expired/region server shutdown
>
> What J-D said but with ornamentation.  See below.
>
> On Mon, Jan 25, 2010 at 7:14 PM, Kannan Muthukkaruppan
> <Ka...@facebook.com> wrote:
>> I was doing some brute force testing - running one instance of PerformanceEvaluation (PE) for writes, and another instance for randomReads.
>>
>
> Whats the cluster your hitting like?  That 5-node thingy?  Whats the
> hardware profile.
>
>
>> One of the region servers went down after a while. [This is on 0.20.2]. The region server log had things like:
>>
>> 2010-01-25 11:33:39,416 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 34 on 60020 took 65190ms appending an edit to hlog; editcount=27878
>
> This is saying that it took 65 seconds to append to hdfs.  What was
> going on at that time?  A fat GC in the regionserver or over in a
> Datanode?  You can enable GC logging uncommenting stuff in the
> hbase-env.sh.   Feed the GC log to https://gchisto.dev.java.net/
> (suggested by the zookeeper lads).  Its good for finding the long
> pauses.  We should find the logs around the long GC pause.  Its
> probably a failed promotion that brought on the stop-the-world GC.  Or
> your HDFS was struggling?
>
> I was going to say that the PE puts the worse kind of loading on the
> hbase cache -- nothing sticks around -- but looking at your numbers
> below, cache seems to be working pretty well.
>
>> 2010-01-25 11:33:39,416 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=1815.4441MB (1903631104), Free=581.23096MB (609464832),\
>>  Max=2396.675MB (2513095936), Counts: Blocks=28216, Access=10904689, Hit=9313612, Miss=1591077, Evictions=110, Evicted=414336, Ratios: Hit Ratio=85.40924191474\
>> 915%, Miss Ratio=14.590759575366974%, Evicted/Run=3766.69091796875
>> 2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 52 on 60020 took 65195ms appending an edit to hlog; editcount=27879
>> 2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 75 on 60020 took 65195ms appending an edit to hlog; editcount=27880
>> 2010-01-25 11:33:39,417 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner -5220035164735014996 lease expired
>>
>> This was followed by:
>> 2010-01-25 11:33:39,636 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x12652c620520001 to sun.nio.ch.SelectionKeyImpl@788ab708<ma...@788ab708>
>> java.io.IOException: TIMED OUT
>
> This is the zk timeout.
>
>>
>> And also errors like:
>> 2010-01-25 11:33:39,692 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:
>> org.apache.hadoop.hbase.UnknownScannerException: Name: -5220035164735014996
>>
>
> These will happen after the above.  The regionserver is on its way
> down.  Probably emptied the list of outstanding regions.
>
>
>> Subsequently, I restarted the region server and just started the PE in sequentialWrite mode. It seems to be off to a really slow start. For several regions, it is printing the following:
>>
>> 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta attempt 0 of 10 failed; retrying after sleep of 1000 because: region offline: TestTable,0031841132,1264121653303
>> 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0032000000,99999999999999
>> 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Cached location for .META.,,1 is 10.129.68.212:60020
>> 10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000 because: region offline: TestTable,0031841132,1264121653303
>> 10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0032000000,99999999999999
>> [..and so on..]
>> 10/01/25 18:12:59 DEBUG client.HConnectionManager$TableServers: Cached location for .META.,,1 is 10.129.68.212:60020
>> 10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta attempt 8 of 10 failed; retrying after sleep of 16000 because: region offline: TestTable,0031841132,1264121653303
>> 10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0032000000,99999999999999
>>
>> When I scan the '.META.', I noticed that most of the regions for which I get the above error were in the middle of (?) a split. For example:
>>
>> TestTable,0031841132,126412 column=info:regioninfo, timestamp=1264470099796, value=REGION => {NAME => 'TestTa
>>  1653303                     ble,0031841132,1264121653303', STARTKEY => '0031841132', ENDKEY => '0032057774',
>>                             ENCODED => 644500792, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'TestTab
>>                             le', FAMILIES => [{NAME => 'info', VERSIONS => '3', COMPRESSION => 'NONE', TTL =>
>>                              '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]
>>                             }}
>>  TestTable,0031841132,126412 column=info:server, timestamp=1264121655148, value=10.129.68.214:60020
>>  1653303
>>  TestTable,0031841132,126412 column=info:serverstartcode, timestamp=1264121655148, value=1264109117245
>>  1653303
>>  TestTable,0031841132,126412 column=info:splitA, timestamp=1264470099796, value=\000\n0031978284\000\000\000\0
>>  1653303                     01&g\016\357\277\275\024\"TestTable,0031841132,1264449401108\000\n0031841132\000\
>>                             000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0
>>                             05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0
>>                             00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0
>>                             00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\
>>                             000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M
>>                             EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004true\357\277\27
>>                             5\337\274#
>>  TestTable,0031841132,126412 column=info:splitB, timestamp=1264470099796, value=\000\n0032057774\000\000\000\0
>>  1653303                     01&g\016\357\277\275\024\"TestTable,0031978284,1264449401108\000\n0031978284\000\
>>                             000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0
>>                             05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0
>>                             00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0
>>                             00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\
>>                             000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M
>>                             EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004truej\357\277\2
>>                             75\312\262
>>
>> Any thoughts/ideas on what might be going on? Appreciate any input in helping understand what might be going on.
>>
>
> There is no line after the above?  A region with a startkey of
> 0031841132?  Then its a dropped edit of .META.  The parent region --
> the one being split got updates which added the splitA and splitB and
> its offlining but the new region didn't get inserted?  The crash
> happened just at that time?
>
> It would be interesting to take a look at the regionserver logs from
> that time.  Please post if you have a moment so we can take a looksee.
>
> Above kinda thing is what the master rewrite is about moving state
> transitions up to zk so atomic over cluster moving regions through
> transitions rather than as here, a multi-row update might not all go
> through as things currently work.
>
> St.Ack
>
>> Regards,
>> Kannan
>>
>>
>>
>

RE: scanner lease expired/region server shutdown

Posted by Kannan Muthukkaruppan <Ka...@facebook.com>.

Looking further up in the logs (about 20 minutes prior in the logs when errors first started happening), I noticed the following. 

btw, ulimit -a shows that I have "open files" set to 64k. Is that not sufficient?

2010-01-25 11:10:21,774 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.129.68.212:50010, storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075, ipcPort=50020):Data\
XceiveServer: java.io.IOException: Too many open files
        at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
        at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:145)
        at sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:130)
        at java.lang.Thread.run(Thread.java:619)

2010-01-25 11:10:21,566 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.129.68.212:50010, storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075, ipcPort=50020):Got \
exception while serving blk_3332344970774019423_10249 to /10.129.68.212:
java.io.FileNotFoundException: /mnt/d1/HDFS-kannan1/current/subdir23/blk_3332344970774019423_10249.meta (Too many open files)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:106)
        at org.apache.hadoop.hdfs.server.datanode.FSDataset.getMetaDataInputStream(FSDataset.java:682)
        at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:97)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
        at java.lang.Thread.run(Thread.java:619)



________________________________________
From: Kannan Muthukkaruppan [Kannan@facebook.com]
Sent: Tuesday, January 26, 2010 7:01 AM
To: hbase-dev@hadoop.apache.org
Subject: RE: scanner lease expired/region server shutdown

1. Yes, it is a 5 node, setup.

1 Name Node/4 Data Nodes. Of the 4 DN, one is running the HBase Master, and the other three are running region servers. ZK is on all the same 5 nodes. Should ideally have separated this out. The nodes are 16GB, 4 disk machines.

2. I examined the HDFS datanode log on the same machine around that time the problems happened, and saw this:

2010-01-25 11:33:09,531 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.129.68.212:50010, storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075, ipcPort=\
50020):Got exception while serving blk_5691809099673541164_10475 to /10.129.68.212:
java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.129.68.212:50010 remote=/10.129.68.212:477\
29]
        at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
        at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
        at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313)
        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
        at java.lang.Thread.run(Thread.java:619)

[For future runs, will try to turn on Java GC logging as well as per your suggestion.]

3.  You wrote <<< There is no line after the above?  A region with a startkey of 0031841132?  >>>

I just cut-paste a section of the PE log and the scan of the '.META.' from shell. But there were many such regions for which the PE client reported errors of the form:

10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000 because: Connection refused
10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0015411453,99999999999999
10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers: Cached location for .META.,,1 is 10.129.68.212:60020

and scan of '.META.' revealed that those regions, such as the 'TestTable,0015411453,99999999999999', had splitA & splitB portions in META.

<<< Then its a dropped edit of .META.  The parent region -- the one being split got updates which added the splitA and splitB and its offlining but the new region didn't get inserted?  The crash happened just at that time?>>>

Didn't fully understand the above.

<<< It would be interesting to take a look at the regionserver logs from that time.  Please post if you have a moment so we can take a looksee.>>>

Will do. Should I just send it as an attachment to the list? Or is their a recommended way of doing this?

regards,
Kannan
________________________________________
From: saint.ack@gmail.com [saint.ack@gmail.com] On Behalf Of Stack [stack@duboce.net]
Sent: Monday, January 25, 2010 8:59 PM
To: hbase-dev@hadoop.apache.org
Subject: Re: scanner lease expired/region server shutdown

What J-D said but with ornamentation.  See below.

On Mon, Jan 25, 2010 at 7:14 PM, Kannan Muthukkaruppan
<Ka...@facebook.com> wrote:
> I was doing some brute force testing - running one instance of PerformanceEvaluation (PE) for writes, and another instance for randomReads.
>

Whats the cluster your hitting like?  That 5-node thingy?  Whats the
hardware profile.


> One of the region servers went down after a while. [This is on 0.20.2]. The region server log had things like:
>
> 2010-01-25 11:33:39,416 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 34 on 60020 took 65190ms appending an edit to hlog; editcount=27878

This is saying that it took 65 seconds to append to hdfs.  What was
going on at that time?  A fat GC in the regionserver or over in a
Datanode?  You can enable GC logging uncommenting stuff in the
hbase-env.sh.   Feed the GC log to https://gchisto.dev.java.net/
(suggested by the zookeeper lads).  Its good for finding the long
pauses.  We should find the logs around the long GC pause.  Its
probably a failed promotion that brought on the stop-the-world GC.  Or
your HDFS was struggling?

I was going to say that the PE puts the worse kind of loading on the
hbase cache -- nothing sticks around -- but looking at your numbers
below, cache seems to be working pretty well.

> 2010-01-25 11:33:39,416 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=1815.4441MB (1903631104), Free=581.23096MB (609464832),\
>  Max=2396.675MB (2513095936), Counts: Blocks=28216, Access=10904689, Hit=9313612, Miss=1591077, Evictions=110, Evicted=414336, Ratios: Hit Ratio=85.40924191474\
> 915%, Miss Ratio=14.590759575366974%, Evicted/Run=3766.69091796875
> 2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 52 on 60020 took 65195ms appending an edit to hlog; editcount=27879
> 2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 75 on 60020 took 65195ms appending an edit to hlog; editcount=27880
> 2010-01-25 11:33:39,417 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner -5220035164735014996 lease expired
>
> This was followed by:
> 2010-01-25 11:33:39,636 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x12652c620520001 to sun.nio.ch.SelectionKeyImpl@788ab708<ma...@788ab708>
> java.io.IOException: TIMED OUT

This is the zk timeout.

>
> And also errors like:
> 2010-01-25 11:33:39,692 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:
> org.apache.hadoop.hbase.UnknownScannerException: Name: -5220035164735014996
>

These will happen after the above.  The regionserver is on its way
down.  Probably emptied the list of outstanding regions.


> Subsequently, I restarted the region server and just started the PE in sequentialWrite mode. It seems to be off to a really slow start. For several regions, it is printing the following:
>
> 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta attempt 0 of 10 failed; retrying after sleep of 1000 because: region offline: TestTable,0031841132,1264121653303
> 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0032000000,99999999999999
> 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Cached location for .META.,,1 is 10.129.68.212:60020
> 10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000 because: region offline: TestTable,0031841132,1264121653303
> 10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0032000000,99999999999999
> [..and so on..]
> 10/01/25 18:12:59 DEBUG client.HConnectionManager$TableServers: Cached location for .META.,,1 is 10.129.68.212:60020
> 10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta attempt 8 of 10 failed; retrying after sleep of 16000 because: region offline: TestTable,0031841132,1264121653303
> 10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0032000000,99999999999999
>
> When I scan the '.META.', I noticed that most of the regions for which I get the above error were in the middle of (?) a split. For example:
>
> TestTable,0031841132,126412 column=info:regioninfo, timestamp=1264470099796, value=REGION => {NAME => 'TestTa
>  1653303                     ble,0031841132,1264121653303', STARTKEY => '0031841132', ENDKEY => '0032057774',
>                             ENCODED => 644500792, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'TestTab
>                             le', FAMILIES => [{NAME => 'info', VERSIONS => '3', COMPRESSION => 'NONE', TTL =>
>                              '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]
>                             }}
>  TestTable,0031841132,126412 column=info:server, timestamp=1264121655148, value=10.129.68.214:60020
>  1653303
>  TestTable,0031841132,126412 column=info:serverstartcode, timestamp=1264121655148, value=1264109117245
>  1653303
>  TestTable,0031841132,126412 column=info:splitA, timestamp=1264470099796, value=\000\n0031978284\000\000\000\0
>  1653303                     01&g\016\357\277\275\024\"TestTable,0031841132,1264449401108\000\n0031841132\000\
>                             000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0
>                             05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0
>                             00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0
>                             00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\
>                             000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M
>                             EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004true\357\277\27
>                             5\337\274#
>  TestTable,0031841132,126412 column=info:splitB, timestamp=1264470099796, value=\000\n0032057774\000\000\000\0
>  1653303                     01&g\016\357\277\275\024\"TestTable,0031978284,1264449401108\000\n0031978284\000\
>                             000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0
>                             05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0
>                             00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0
>                             00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\
>                             000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M
>                             EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004truej\357\277\2
>                             75\312\262
>
> Any thoughts/ideas on what might be going on? Appreciate any input in helping understand what might be going on.
>

There is no line after the above?  A region with a startkey of
0031841132?  Then its a dropped edit of .META.  The parent region --
the one being split got updates which added the splitA and splitB and
its offlining but the new region didn't get inserted?  The crash
happened just at that time?

It would be interesting to take a look at the regionserver logs from
that time.  Please post if you have a moment so we can take a looksee.

Above kinda thing is what the master rewrite is about moving state
transitions up to zk so atomic over cluster moving regions through
transitions rather than as here, a multi-row update might not all go
through as things currently work.

St.Ack

> Regards,
> Kannan
>
>
>

Re: scanner lease expired/region server shutdown

Posted by Stack <st...@duboce.net>.

On Wed, Jan 27, 2010 at 2:30 PM, Kannan Muthukkaruppan
<Ka...@facebook.com> wrote:
> 2. <<<(Thinking on it, it would be easy enough to add to the master fixup
> for this condition until we do the proper fix in 0.21).>>>
>
> Do share your thoughts on that.

.META. is scanned once a minute (by default).  If we see a split
parent without daughters, wait some time for the daughters to show up
and if still not present, add it (the info to add is in the splitA and
splitB columns of parent).  Even if the updates from regionserver go
through at later date, it should not clash with what is already
written.

Above is a bit hacky but might help till we get all up on 0.21.

I'm interested though in why the subsequent edits that added the
daughter regions did not go through.  Perhaps all handlers on the
regionserver carrying .META. were occupied?  Perhaps there was a block
on taking on new writes because of lack of resources?  Was this the
same server that was running atop a HDFS that was suffering "too many
open files"?

>
> 3. <<< Is there a public folder some place you can post them so I can pull them.>>>
>
> Stack: perhaps I'll email it to you offline as a start.

Please do.

St.Ack

RE: scanner lease expired/region server shutdown

Posted by Kannan Muthukkaruppan <Ka...@facebook.com>.

1. <<< Well, on split, parent is in .META. but offlined.  Its followed by two
entries, one for each daughter region.  I was asking if after the
offlined parent, was there not an entry for a region with same start
key as parent (the daughter that is to take over from the parent?).
If not, then it would seem that update of the parent -- making it
offlined and adding the splitA and splitB columns went through but the
addition of the daughter rows did not... which is odd.>>>>

That's what seems to have happened. For all these problem regions (about 50 out of the 400 odd regions my table had), a scan of .META. shows 5 entries: info:regioninfo (with OFFLINE=>true), info:server, info:serverstartcode, info:splitA, info:splitB.

 TestTable,0076347792,126444 column=info:regioninfo, timestamp=1264470099825, value=REGION => {NAME => 'TestTa
 7634546                     ble,0076347792,1264447634546', STARTKEY => '0076347792', ENDKEY => '', ENCODED =>
                              654466296, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'TestTable', FAMIL
                             ...
 TestTable,0076347792,126444 column=info:server, timestamp=1264447641018, value=10.129.68.214:60020
 7634546
 TestTable,0076347792,126444 column=info:serverstartcode, timestamp=1264447641018, value=1264109117245
 7634546
 TestTable,0076347792,126444 column=info:splitA, timestamp=1264470099825, value=\000\n0076724048\000\000\000\0
 7634546                     01&g\003\357\277\275\e\"TestTable,0076347792,1264448682267\000\n0076347792\000\00
 TestTable,0076347792,126444 column=info:splitB, timestamp=1264470099825, value=\000\000\000\000\000\001&g\003
 7634546                     \357\277\275\e\"TestTable,0076724048,1264448682267\000\n0076724048\000\000\000\00

There is no other entry with a start key of 0076347792.

2. <<<(Thinking on it, it would be easy enough to add to the master fixup
for this condition until we do the proper fix in 0.21).>>>

Do share your thoughts on that.

3. <<< Is there a public folder some place you can post them so I can pull them.>>>

Stack: perhaps I'll email it to you offline as a start.

regards,
Kannan
-----Original Message-----
From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
Sent: Tuesday, January 26, 2010 9:20 AM
To: hbase-dev@hadoop.apache.org
Subject: Re: scanner lease expired/region server shutdown

On Tue, Jan 26, 2010 at 7:01 AM, Kannan Muthukkaruppan
<Ka...@facebook.com> wrote:
> 1 Name Node/4 Data Nodes. Of the 4 DN, one is running the HBase Master, and the other three are running region servers. ZK is on all the same 5 nodes. Should ideally have separated this out. The nodes are 16GB, 4 disk machines.
>

CPUs?  4 disks is good.


> 2. I examined the HDFS datanode log on the same machine around that time the problems happened, and saw this:
>
> 2010-01-25 11:33:09,531 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.129.68.212:50010, storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075, ipcPort=\
> 50020):Got exception while serving blk_5691809099673541164_10475 to /10.129.68.212:
> java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.129.68.212:50010 remote=/10.129.68.212:477\
> 29]
>        at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
>        at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
>        at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
>        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313)
>        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400)
>        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180)
>        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
>        at java.lang.Thread.run(Thread.java:619)
>

Does this exception show in regionserver logs -- the client?  See
HADOOP-3831 for 'fix' that has DFSClient reestablishing connection.


> [For future runs, will try to turn on Java GC logging as well as per your suggestion.]

Yes. Getting GC settings right is critical as it is for any java application.



> <<< Then its a dropped edit of .META.  The parent region -- the one being split got updates which added the splitA and splitB and its offlining but the new region didn't get inserted?  The crash happened just at that time?>>>
>
> Didn't fully understand the above.
>

Well, on split, parent is in .META. but offlined.  Its followed by two
entries, one for each daughter region.  I was asking if after the
offlined parent, was there not an entry for a region with same start
key as parent (the daughter that is to take over from the parent?).
If not, then it would seem that update of the parent -- making it
offlined and adding the splitA and splitB columns went through but the
addition of the daughter rows did not... which is odd.

(Thinking on it, it would be easy enough to add to the master fixup
for this condition until we do the proper fix in 0.21).


> <<< It would be interesting to take a look at the regionserver logs from that time.  Please post if you have a moment so we can take a looksee.>>>
>
> Will do. Should I just send it as an attachment to the list? Or is their a recommended way of doing this?
>
Don't attach.  They just get dropped.  Is there a public folder some
place you can post them so I can pull them.

Thanks Kannan.
St.Ack


> regards,
> Kannan
> ________________________________________
> From: saint.ack@gmail.com [saint.ack@gmail.com] On Behalf Of Stack [stack@duboce.net]
> Sent: Monday, January 25, 2010 8:59 PM
> To: hbase-dev@hadoop.apache.org
> Subject: Re: scanner lease expired/region server shutdown
>
> What J-D said but with ornamentation.  See below.
>
> On Mon, Jan 25, 2010 at 7:14 PM, Kannan Muthukkaruppan
> <Ka...@facebook.com> wrote:
>> I was doing some brute force testing - running one instance of PerformanceEvaluation (PE) for writes, and another instance for randomReads.
>>
>
> Whats the cluster your hitting like?  That 5-node thingy?  Whats the
> hardware profile.
>
>
>> One of the region servers went down after a while. [This is on 0.20.2]. The region server log had things like:
>>
>> 2010-01-25 11:33:39,416 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 34 on 60020 took 65190ms appending an edit to hlog; editcount=27878
>
> This is saying that it took 65 seconds to append to hdfs.  What was
> going on at that time?  A fat GC in the regionserver or over in a
> Datanode?  You can enable GC logging uncommenting stuff in the
> hbase-env.sh.   Feed the GC log to https://gchisto.dev.java.net/
> (suggested by the zookeeper lads).  Its good for finding the long
> pauses.  We should find the logs around the long GC pause.  Its
> probably a failed promotion that brought on the stop-the-world GC.  Or
> your HDFS was struggling?
>
> I was going to say that the PE puts the worse kind of loading on the
> hbase cache -- nothing sticks around -- but looking at your numbers
> below, cache seems to be working pretty well.
>
>> 2010-01-25 11:33:39,416 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=1815.4441MB (1903631104), Free=581.23096MB (609464832),\
>>  Max=2396.675MB (2513095936), Counts: Blocks=28216, Access=10904689, Hit=9313612, Miss=1591077, Evictions=110, Evicted=414336, Ratios: Hit Ratio=85.40924191474\
>> 915%, Miss Ratio=14.590759575366974%, Evicted/Run=3766.69091796875
>> 2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 52 on 60020 took 65195ms appending an edit to hlog; editcount=27879
>> 2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 75 on 60020 took 65195ms appending an edit to hlog; editcount=27880
>> 2010-01-25 11:33:39,417 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner -5220035164735014996 lease expired
>>
>> This was followed by:
>> 2010-01-25 11:33:39,636 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x12652c620520001 to sun.nio.ch.SelectionKeyImpl@788ab708<ma...@788ab708>
>> java.io.IOException: TIMED OUT
>
> This is the zk timeout.
>
>>
>> And also errors like:
>> 2010-01-25 11:33:39,692 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:
>> org.apache.hadoop.hbase.UnknownScannerException: Name: -5220035164735014996
>>
>
> These will happen after the above.  The regionserver is on its way
> down.  Probably emptied the list of outstanding regions.
>
>
>> Subsequently, I restarted the region server and just started the PE in sequentialWrite mode. It seems to be off to a really slow start. For several regions, it is printing the following:
>>
>> 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta attempt 0 of 10 failed; retrying after sleep of 1000 because: region offline: TestTable,0031841132,1264121653303
>> 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0032000000,99999999999999
>> 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Cached location for .META.,,1 is 10.129.68.212:60020
>> 10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000 because: region offline: TestTable,0031841132,1264121653303
>> 10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0032000000,99999999999999
>> [..and so on..]
>> 10/01/25 18:12:59 DEBUG client.HConnectionManager$TableServers: Cached location for .META.,,1 is 10.129.68.212:60020
>> 10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta attempt 8 of 10 failed; retrying after sleep of 16000 because: region offline: TestTable,0031841132,1264121653303
>> 10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0032000000,99999999999999
>>
>> When I scan the '.META.', I noticed that most of the regions for which I get the above error were in the middle of (?) a split. For example:
>>
>> TestTable,0031841132,126412 column=info:regioninfo, timestamp=1264470099796, value=REGION => {NAME => 'TestTa
>>  1653303                     ble,0031841132,1264121653303', STARTKEY => '0031841132', ENDKEY => '0032057774',
>>                             ENCODED => 644500792, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'TestTab
>>                             le', FAMILIES => [{NAME => 'info', VERSIONS => '3', COMPRESSION => 'NONE', TTL =>
>>                              '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]
>>                             }}
>>  TestTable,0031841132,126412 column=info:server, timestamp=1264121655148, value=10.129.68.214:60020
>>  1653303
>>  TestTable,0031841132,126412 column=info:serverstartcode, timestamp=1264121655148, value=1264109117245
>>  1653303
>>  TestTable,0031841132,126412 column=info:splitA, timestamp=1264470099796, value=\000\n0031978284\000\000\000\0
>>  1653303                     01&g\016\357\277\275\024\"TestTable,0031841132,1264449401108\000\n0031841132\000\
>>                             000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0
>>                             05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0
>>                             00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0
>>                             00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\
>>                             000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M
>>                             EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004true\357\277\27
>>                             5\337\274#
>>  TestTable,0031841132,126412 column=info:splitB, timestamp=1264470099796, value=\000\n0032057774\000\000\000\0
>>  1653303                     01&g\016\357\277\275\024\"TestTable,0031978284,1264449401108\000\n0031978284\000\
>>                             000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0
>>                             05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0
>>                             00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0
>>                             00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\
>>                             000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M
>>                             EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004truej\357\277\2
>>                             75\312\262
>>
>> Any thoughts/ideas on what might be going on? Appreciate any input in helping understand what might be going on.
>>
>
> There is no line after the above?  A region with a startkey of
> 0031841132?  Then its a dropped edit of .META.  The parent region --
> the one being split got updates which added the splitA and splitB and
> its offlining but the new region didn't get inserted?  The crash
> happened just at that time?
>
> It would be interesting to take a look at the regionserver logs from
> that time.  Please post if you have a moment so we can take a looksee.
>
> Above kinda thing is what the master rewrite is about moving state
> transitions up to zk so atomic over cluster moving regions through
> transitions rather than as here, a multi-row update might not all go
> through as things currently work.
>
> St.Ack
>
>> Regards,
>> Kannan
>>
>>
>>
>

Re: scanner lease expired/region server shutdown

Posted by Stack <st...@duboce.net>.

On Tue, Jan 26, 2010 at 7:01 AM, Kannan Muthukkaruppan
<Ka...@facebook.com> wrote:
> 1 Name Node/4 Data Nodes. Of the 4 DN, one is running the HBase Master, and the other three are running region servers. ZK is on all the same 5 nodes. Should ideally have separated this out. The nodes are 16GB, 4 disk machines.
>

CPUs?  4 disks is good.


> 2. I examined the HDFS datanode log on the same machine around that time the problems happened, and saw this:
>
> 2010-01-25 11:33:09,531 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.129.68.212:50010, storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075, ipcPort=\
> 50020):Got exception while serving blk_5691809099673541164_10475 to /10.129.68.212:
> java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.129.68.212:50010 remote=/10.129.68.212:477\
> 29]
>        at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
>        at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
>        at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
>        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313)
>        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400)
>        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180)
>        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
>        at java.lang.Thread.run(Thread.java:619)
>

Does this exception show in regionserver logs -- the client?  See
HADOOP-3831 for 'fix' that has DFSClient reestablishing connection.


> [For future runs, will try to turn on Java GC logging as well as per your suggestion.]

Yes. Getting GC settings right is critical as it is for any java application.



> <<< Then its a dropped edit of .META.  The parent region -- the one being split got updates which added the splitA and splitB and its offlining but the new region didn't get inserted?  The crash happened just at that time?>>>
>
> Didn't fully understand the above.
>

Well, on split, parent is in .META. but offlined.  Its followed by two
entries, one for each daughter region.  I was asking if after the
offlined parent, was there not an entry for a region with same start
key as parent (the daughter that is to take over from the parent?).
If not, then it would seem that update of the parent -- making it
offlined and adding the splitA and splitB columns went through but the
addition of the daughter rows did not... which is odd.

(Thinking on it, it would be easy enough to add to the master fixup
for this condition until we do the proper fix in 0.21).


> <<< It would be interesting to take a look at the regionserver logs from that time.  Please post if you have a moment so we can take a looksee.>>>
>
> Will do. Should I just send it as an attachment to the list? Or is their a recommended way of doing this?
>
Don't attach.  They just get dropped.  Is there a public folder some
place you can post them so I can pull them.

Thanks Kannan.
St.Ack


> regards,
> Kannan
> ________________________________________
> From: saint.ack@gmail.com [saint.ack@gmail.com] On Behalf Of Stack [stack@duboce.net]
> Sent: Monday, January 25, 2010 8:59 PM
> To: hbase-dev@hadoop.apache.org
> Subject: Re: scanner lease expired/region server shutdown
>
> What J-D said but with ornamentation.  See below.
>
> On Mon, Jan 25, 2010 at 7:14 PM, Kannan Muthukkaruppan
> <Ka...@facebook.com> wrote:
>> I was doing some brute force testing - running one instance of PerformanceEvaluation (PE) for writes, and another instance for randomReads.
>>
>
> Whats the cluster your hitting like?  That 5-node thingy?  Whats the
> hardware profile.
>
>
>> One of the region servers went down after a while. [This is on 0.20.2]. The region server log had things like:
>>
>> 2010-01-25 11:33:39,416 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 34 on 60020 took 65190ms appending an edit to hlog; editcount=27878
>
> This is saying that it took 65 seconds to append to hdfs.  What was
> going on at that time?  A fat GC in the regionserver or over in a
> Datanode?  You can enable GC logging uncommenting stuff in the
> hbase-env.sh.   Feed the GC log to https://gchisto.dev.java.net/
> (suggested by the zookeeper lads).  Its good for finding the long
> pauses.  We should find the logs around the long GC pause.  Its
> probably a failed promotion that brought on the stop-the-world GC.  Or
> your HDFS was struggling?
>
> I was going to say that the PE puts the worse kind of loading on the
> hbase cache -- nothing sticks around -- but looking at your numbers
> below, cache seems to be working pretty well.
>
>> 2010-01-25 11:33:39,416 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=1815.4441MB (1903631104), Free=581.23096MB (609464832),\
>>  Max=2396.675MB (2513095936), Counts: Blocks=28216, Access=10904689, Hit=9313612, Miss=1591077, Evictions=110, Evicted=414336, Ratios: Hit Ratio=85.40924191474\
>> 915%, Miss Ratio=14.590759575366974%, Evicted/Run=3766.69091796875
>> 2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 52 on 60020 took 65195ms appending an edit to hlog; editcount=27879
>> 2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 75 on 60020 took 65195ms appending an edit to hlog; editcount=27880
>> 2010-01-25 11:33:39,417 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner -5220035164735014996 lease expired
>>
>> This was followed by:
>> 2010-01-25 11:33:39,636 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x12652c620520001 to sun.nio.ch.SelectionKeyImpl@788ab708<ma...@788ab708>
>> java.io.IOException: TIMED OUT
>
> This is the zk timeout.
>
>>
>> And also errors like:
>> 2010-01-25 11:33:39,692 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:
>> org.apache.hadoop.hbase.UnknownScannerException: Name: -5220035164735014996
>>
>
> These will happen after the above.  The regionserver is on its way
> down.  Probably emptied the list of outstanding regions.
>
>
>> Subsequently, I restarted the region server and just started the PE in sequentialWrite mode. It seems to be off to a really slow start. For several regions, it is printing the following:
>>
>> 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta attempt 0 of 10 failed; retrying after sleep of 1000 because: region offline: TestTable,0031841132,1264121653303
>> 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0032000000,99999999999999
>> 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Cached location for .META.,,1 is 10.129.68.212:60020
>> 10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000 because: region offline: TestTable,0031841132,1264121653303
>> 10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0032000000,99999999999999
>> [..and so on..]
>> 10/01/25 18:12:59 DEBUG client.HConnectionManager$TableServers: Cached location for .META.,,1 is 10.129.68.212:60020
>> 10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta attempt 8 of 10 failed; retrying after sleep of 16000 because: region offline: TestTable,0031841132,1264121653303
>> 10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0032000000,99999999999999
>>
>> When I scan the '.META.', I noticed that most of the regions for which I get the above error were in the middle of (?) a split. For example:
>>
>> TestTable,0031841132,126412 column=info:regioninfo, timestamp=1264470099796, value=REGION => {NAME => 'TestTa
>>  1653303                     ble,0031841132,1264121653303', STARTKEY => '0031841132', ENDKEY => '0032057774',
>>                             ENCODED => 644500792, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'TestTab
>>                             le', FAMILIES => [{NAME => 'info', VERSIONS => '3', COMPRESSION => 'NONE', TTL =>
>>                              '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]
>>                             }}
>>  TestTable,0031841132,126412 column=info:server, timestamp=1264121655148, value=10.129.68.214:60020
>>  1653303
>>  TestTable,0031841132,126412 column=info:serverstartcode, timestamp=1264121655148, value=1264109117245
>>  1653303
>>  TestTable,0031841132,126412 column=info:splitA, timestamp=1264470099796, value=\000\n0031978284\000\000\000\0
>>  1653303                     01&g\016\357\277\275\024\"TestTable,0031841132,1264449401108\000\n0031841132\000\
>>                             000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0
>>                             05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0
>>                             00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0
>>                             00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\
>>                             000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M
>>                             EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004true\357\277\27
>>                             5\337\274#
>>  TestTable,0031841132,126412 column=info:splitB, timestamp=1264470099796, value=\000\n0032057774\000\000\000\0
>>  1653303                     01&g\016\357\277\275\024\"TestTable,0031978284,1264449401108\000\n0031978284\000\
>>                             000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0
>>                             05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0
>>                             00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0
>>                             00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\
>>                             000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M
>>                             EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004truej\357\277\2
>>                             75\312\262
>>
>> Any thoughts/ideas on what might be going on? Appreciate any input in helping understand what might be going on.
>>
>
> There is no line after the above?  A region with a startkey of
> 0031841132?  Then its a dropped edit of .META.  The parent region --
> the one being split got updates which added the splitA and splitB and
> its offlining but the new region didn't get inserted?  The crash
> happened just at that time?
>
> It would be interesting to take a look at the regionserver logs from
> that time.  Please post if you have a moment so we can take a looksee.
>
> Above kinda thing is what the master rewrite is about moving state
> transitions up to zk so atomic over cluster moving regions through
> transitions rather than as here, a multi-row update might not all go
> through as things currently work.
>
> St.Ack
>
>> Regards,
>> Kannan
>>
>>
>>
>

RE: scanner lease expired/region server shutdown

Posted by Kannan Muthukkaruppan <Ka...@facebook.com>.

1. Yes, it is a 5 node, setup. 

1 Name Node/4 Data Nodes. Of the 4 DN, one is running the HBase Master, and the other three are running region servers. ZK is on all the same 5 nodes. Should ideally have separated this out. The nodes are 16GB, 4 disk machines.

2. I examined the HDFS datanode log on the same machine around that time the problems happened, and saw this:

2010-01-25 11:33:09,531 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.129.68.212:50010, storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075, ipcPort=\
50020):Got exception while serving blk_5691809099673541164_10475 to /10.129.68.212:
java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.129.68.212:50010 remote=/10.129.68.212:477\
29]
        at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
        at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
        at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313)
        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
        at java.lang.Thread.run(Thread.java:619)

[For future runs, will try to turn on Java GC logging as well as per your suggestion.]

3.  You wrote <<< There is no line after the above?  A region with a startkey of 0031841132?  >>>

I just cut-paste a section of the PE log and the scan of the '.META.' from shell. But there were many such regions for which the PE client reported errors of the form:

10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000 because: Connection refused
10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0015411453,99999999999999
10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers: Cached location for .META.,,1 is 10.129.68.212:60020

and scan of '.META.' revealed that those regions, such as the 'TestTable,0015411453,99999999999999', had splitA & splitB portions in META.

<<< Then its a dropped edit of .META.  The parent region -- the one being split got updates which added the splitA and splitB and its offlining but the new region didn't get inserted?  The crash happened just at that time?>>>

Didn't fully understand the above.

<<< It would be interesting to take a look at the regionserver logs from that time.  Please post if you have a moment so we can take a looksee.>>>

Will do. Should I just send it as an attachment to the list? Or is their a recommended way of doing this?

regards,
Kannan
________________________________________
From: saint.ack@gmail.com [saint.ack@gmail.com] On Behalf Of Stack [stack@duboce.net]
Sent: Monday, January 25, 2010 8:59 PM
To: hbase-dev@hadoop.apache.org
Subject: Re: scanner lease expired/region server shutdown

What J-D said but with ornamentation.  See below.

On Mon, Jan 25, 2010 at 7:14 PM, Kannan Muthukkaruppan
<Ka...@facebook.com> wrote:
> I was doing some brute force testing - running one instance of PerformanceEvaluation (PE) for writes, and another instance for randomReads.
>

Whats the cluster your hitting like?  That 5-node thingy?  Whats the
hardware profile.


> One of the region servers went down after a while. [This is on 0.20.2]. The region server log had things like:
>
> 2010-01-25 11:33:39,416 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 34 on 60020 took 65190ms appending an edit to hlog; editcount=27878

This is saying that it took 65 seconds to append to hdfs.  What was
going on at that time?  A fat GC in the regionserver or over in a
Datanode?  You can enable GC logging uncommenting stuff in the
hbase-env.sh.   Feed the GC log to https://gchisto.dev.java.net/
(suggested by the zookeeper lads).  Its good for finding the long
pauses.  We should find the logs around the long GC pause.  Its
probably a failed promotion that brought on the stop-the-world GC.  Or
your HDFS was struggling?

I was going to say that the PE puts the worse kind of loading on the
hbase cache -- nothing sticks around -- but looking at your numbers
below, cache seems to be working pretty well.

> 2010-01-25 11:33:39,416 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=1815.4441MB (1903631104), Free=581.23096MB (609464832),\
>  Max=2396.675MB (2513095936), Counts: Blocks=28216, Access=10904689, Hit=9313612, Miss=1591077, Evictions=110, Evicted=414336, Ratios: Hit Ratio=85.40924191474\
> 915%, Miss Ratio=14.590759575366974%, Evicted/Run=3766.69091796875
> 2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 52 on 60020 took 65195ms appending an edit to hlog; editcount=27879
> 2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 75 on 60020 took 65195ms appending an edit to hlog; editcount=27880
> 2010-01-25 11:33:39,417 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner -5220035164735014996 lease expired
>
> This was followed by:
> 2010-01-25 11:33:39,636 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x12652c620520001 to sun.nio.ch.SelectionKeyImpl@788ab708<ma...@788ab708>
> java.io.IOException: TIMED OUT

This is the zk timeout.

>
> And also errors like:
> 2010-01-25 11:33:39,692 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:
> org.apache.hadoop.hbase.UnknownScannerException: Name: -5220035164735014996
>

These will happen after the above.  The regionserver is on its way
down.  Probably emptied the list of outstanding regions.


> Subsequently, I restarted the region server and just started the PE in sequentialWrite mode. It seems to be off to a really slow start. For several regions, it is printing the following:
>
> 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta attempt 0 of 10 failed; retrying after sleep of 1000 because: region offline: TestTable,0031841132,1264121653303
> 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0032000000,99999999999999
> 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Cached location for .META.,,1 is 10.129.68.212:60020
> 10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000 because: region offline: TestTable,0031841132,1264121653303
> 10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0032000000,99999999999999
> [..and so on..]
> 10/01/25 18:12:59 DEBUG client.HConnectionManager$TableServers: Cached location for .META.,,1 is 10.129.68.212:60020
> 10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta attempt 8 of 10 failed; retrying after sleep of 16000 because: region offline: TestTable,0031841132,1264121653303
> 10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0032000000,99999999999999
>
> When I scan the '.META.', I noticed that most of the regions for which I get the above error were in the middle of (?) a split. For example:
>
> TestTable,0031841132,126412 column=info:regioninfo, timestamp=1264470099796, value=REGION => {NAME => 'TestTa
>  1653303                     ble,0031841132,1264121653303', STARTKEY => '0031841132', ENDKEY => '0032057774',
>                             ENCODED => 644500792, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'TestTab
>                             le', FAMILIES => [{NAME => 'info', VERSIONS => '3', COMPRESSION => 'NONE', TTL =>
>                              '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]
>                             }}
>  TestTable,0031841132,126412 column=info:server, timestamp=1264121655148, value=10.129.68.214:60020
>  1653303
>  TestTable,0031841132,126412 column=info:serverstartcode, timestamp=1264121655148, value=1264109117245
>  1653303
>  TestTable,0031841132,126412 column=info:splitA, timestamp=1264470099796, value=\000\n0031978284\000\000\000\0
>  1653303                     01&g\016\357\277\275\024\"TestTable,0031841132,1264449401108\000\n0031841132\000\
>                             000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0
>                             05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0
>                             00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0
>                             00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\
>                             000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M
>                             EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004true\357\277\27
>                             5\337\274#
>  TestTable,0031841132,126412 column=info:splitB, timestamp=1264470099796, value=\000\n0032057774\000\000\000\0
>  1653303                     01&g\016\357\277\275\024\"TestTable,0031978284,1264449401108\000\n0031978284\000\
>                             000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0
>                             05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0
>                             00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0
>                             00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\
>                             000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M
>                             EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004truej\357\277\2
>                             75\312\262
>
> Any thoughts/ideas on what might be going on? Appreciate any input in helping understand what might be going on.
>

There is no line after the above?  A region with a startkey of
0031841132?  Then its a dropped edit of .META.  The parent region --
the one being split got updates which added the splitA and splitB and
its offlining but the new region didn't get inserted?  The crash
happened just at that time?

It would be interesting to take a look at the regionserver logs from
that time.  Please post if you have a moment so we can take a looksee.

Above kinda thing is what the master rewrite is about moving state
transitions up to zk so atomic over cluster moving regions through
transitions rather than as here, a multi-row update might not all go
through as things currently work.

St.Ack

> Regards,
> Kannan
>
>
>

Re: scanner lease expired/region server shutdown

Posted by Stack <st...@duboce.net>.

What J-D said but with ornamentation.  See below.

On Mon, Jan 25, 2010 at 7:14 PM, Kannan Muthukkaruppan
<Ka...@facebook.com> wrote:
> I was doing some brute force testing - running one instance of PerformanceEvaluation (PE) for writes, and another instance for randomReads.
>

Whats the cluster your hitting like?  That 5-node thingy?  Whats the
hardware profile.


> One of the region servers went down after a while. [This is on 0.20.2]. The region server log had things like:
>
> 2010-01-25 11:33:39,416 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 34 on 60020 took 65190ms appending an edit to hlog; editcount=27878

This is saying that it took 65 seconds to append to hdfs.  What was
going on at that time?  A fat GC in the regionserver or over in a
Datanode?  You can enable GC logging uncommenting stuff in the
hbase-env.sh.   Feed the GC log to https://gchisto.dev.java.net/
(suggested by the zookeeper lads).  Its good for finding the long
pauses.  We should find the logs around the long GC pause.  Its
probably a failed promotion that brought on the stop-the-world GC.  Or
your HDFS was struggling?

I was going to say that the PE puts the worse kind of loading on the
hbase cache -- nothing sticks around -- but looking at your numbers
below, cache seems to be working pretty well.

> 2010-01-25 11:33:39,416 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=1815.4441MB (1903631104), Free=581.23096MB (609464832),\
>  Max=2396.675MB (2513095936), Counts: Blocks=28216, Access=10904689, Hit=9313612, Miss=1591077, Evictions=110, Evicted=414336, Ratios: Hit Ratio=85.40924191474\
> 915%, Miss Ratio=14.590759575366974%, Evicted/Run=3766.69091796875
> 2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 52 on 60020 took 65195ms appending an edit to hlog; editcount=27879
> 2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 75 on 60020 took 65195ms appending an edit to hlog; editcount=27880
> 2010-01-25 11:33:39,417 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner -5220035164735014996 lease expired
>
> This was followed by:
> 2010-01-25 11:33:39,636 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x12652c620520001 to sun.nio.ch.SelectionKeyImpl@788ab708<ma...@788ab708>
> java.io.IOException: TIMED OUT

This is the zk timeout.

>
> And also errors like:
> 2010-01-25 11:33:39,692 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:
> org.apache.hadoop.hbase.UnknownScannerException: Name: -5220035164735014996
>

These will happen after the above.  The regionserver is on its way
down.  Probably emptied the list of outstanding regions.


> Subsequently, I restarted the region server and just started the PE in sequentialWrite mode. It seems to be off to a really slow start. For several regions, it is printing the following:
>
> 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta attempt 0 of 10 failed; retrying after sleep of 1000 because: region offline: TestTable,0031841132,1264121653303
> 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0032000000,99999999999999
> 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Cached location for .META.,,1 is 10.129.68.212:60020
> 10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000 because: region offline: TestTable,0031841132,1264121653303
> 10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0032000000,99999999999999
> [..and so on..]
> 10/01/25 18:12:59 DEBUG client.HConnectionManager$TableServers: Cached location for .META.,,1 is 10.129.68.212:60020
> 10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta attempt 8 of 10 failed; retrying after sleep of 16000 because: region offline: TestTable,0031841132,1264121653303
> 10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0032000000,99999999999999
>
> When I scan the '.META.', I noticed that most of the regions for which I get the above error were in the middle of (?) a split. For example:
>
> TestTable,0031841132,126412 column=info:regioninfo, timestamp=1264470099796, value=REGION => {NAME => 'TestTa
>  1653303                     ble,0031841132,1264121653303', STARTKEY => '0031841132', ENDKEY => '0032057774',
>                             ENCODED => 644500792, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'TestTab
>                             le', FAMILIES => [{NAME => 'info', VERSIONS => '3', COMPRESSION => 'NONE', TTL =>
>                              '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]
>                             }}
>  TestTable,0031841132,126412 column=info:server, timestamp=1264121655148, value=10.129.68.214:60020
>  1653303
>  TestTable,0031841132,126412 column=info:serverstartcode, timestamp=1264121655148, value=1264109117245
>  1653303
>  TestTable,0031841132,126412 column=info:splitA, timestamp=1264470099796, value=\000\n0031978284\000\000\000\0
>  1653303                     01&g\016\357\277\275\024\"TestTable,0031841132,1264449401108\000\n0031841132\000\
>                             000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0
>                             05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0
>                             00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0
>                             00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\
>                             000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M
>                             EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004true\357\277\27
>                             5\337\274#
>  TestTable,0031841132,126412 column=info:splitB, timestamp=1264470099796, value=\000\n0032057774\000\000\000\0
>  1653303                     01&g\016\357\277\275\024\"TestTable,0031978284,1264449401108\000\n0031978284\000\
>                             000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0
>                             05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0
>                             00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0
>                             00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\
>                             000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M
>                             EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004truej\357\277\2
>                             75\312\262
>
> Any thoughts/ideas on what might be going on? Appreciate any input in helping understand what might be going on.
>

There is no line after the above?  A region with a startkey of
0031841132?  Then its a dropped edit of .META.  The parent region --
the one being split got updates which added the splitA and splitB and
its offlining but the new region didn't get inserted?  The crash
happened just at that time?

It would be interesting to take a look at the regionserver logs from
that time.  Please post if you have a moment so we can take a looksee.

Above kinda thing is what the master rewrite is about moving state
transitions up to zk so atomic over cluster moving regions through
transitions rather than as here, a multi-row update might not all go
through as things currently work.

St.Ack

> Regards,
> Kannan
>
>
>