You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Chris Bohme <ch...@pinkmatter.com> on 2011/05/11 15:30:54 UTC

ArrayIndexOutOfBoundsException in FSOutputSummer.write()

Dear community,

 

We are doing a test on a 5 node cluster with a table of about 50 million
rows (writes and reads). At some point we end up getting the following
exception on 2 of the region servers:

 

2011-05-11 14:18:28,660 INFO org.apache.hadoop.hbase.regionserver.Store:
Started compaction of 3 file(s) in cf=Family1  into
hdfs://eagle1:9000/hbase/LongTable/167e7b292cc45b9face9a9cb7d86384c/.tmp,
seqid=66246, totalSize=64.2m

2011-05-11 14:18:28,661 DEBUG org.apache.hadoop.hbase.regionserver.Store:
Compacting
hdfs://eagle1:9000/hbase/LongTable/167e7b292cc45b9face9a9cb7d86384c/Family1/
7884224173883345569, keycount=790840, bloomtype=NONE, size=38.5m

2011-05-11 14:18:28,661 DEBUG org.apache.hadoop.hbase.regionserver.Store:
Compacting
hdfs://eagle1:9000/hbase/LongTable/167e7b292cc45b9face9a9cb7d86384c/Family1/
5160949580594728531, keycount=263370, bloomtype=NONE, size=12.8m

2011-05-11 14:18:28,661 DEBUG org.apache.hadoop.hbase.regionserver.Store:
Compacting
hdfs://eagle1:9000/hbase/LongTable/167e7b292cc45b9face9a9cb7d86384c/Family1/
7505588204602186903, keycount=263900, bloomtype=NONE, size=12.8m

2011-05-11 14:18:30,011 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Flush requested on
LongTable,\x00\x00\x00\x00\x01\xC9\xD5\x13,1305115816217.20a05ebff2597ae6a63
e31a5e57602dc.

2011-05-11 14:18:30,011 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Started memstore flush for
LongTable,\x00\x00\x00\x00\x01\xC9\xD5\x13,1305115816217.20a05ebff2597ae6a63
e31a5e57602dc., current region memstore size 64.2m

2011-05-11 14:18:30,011 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Finished snapshotting, commencing flushing stores

2011-05-11 14:18:31,067 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
serverName=eagle5.pinkmatter.local,60020,1305111886513,
load=(requests=20457, regions=11, usedHeap=934, maxHeap=4087): Replay of
HLog required. Forcing server shutdown

org.apache.hadoop.hbase.DroppedSnapshotException: region:
LongTable,\x00\x00\x00\x00\x01\xC9\xD5\x13,1305115816217.20a05ebff2597ae6a63
e31a5e57602dc.

       at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java
:995)

       at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java
:900)

       at
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:852)

       at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlu
sher.java:392)

       at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlu
sher.java:366)

       at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.jav
a:240)

Caused by: java.lang.ArrayIndexOutOfBoundsException

       at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:83)

       at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStre
am.java:49)

       at java.io.DataOutputStream.write(DataOutputStream.java:90)

       at java.io.DataOutputStream.write(DataOutputStream.java:90)

       at
org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:544)

       at
org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501)

       at
org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:
836)

       at
org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:479
)

       at
org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:448)

       at
org.apache.hadoop.hbase.regionserver.Store.access$100(Store.java:81)

       at
org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store
.java:1513)

       at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java
:973)

       ... 5 more

2011-05-11 14:18:31,067 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
request=4233.9, regions=11, stores=22, storefiles=48, storefileIndexSize=8,
memstoreSize=483, compactionQueueSize=0, flushQueueSize=0, usedHeap=941,
maxHeap=4087, blockCacheSize=412883432, blockCacheFree=444366808,
blockCacheCount=6172, blockCacheHitCount=6181, blockCacheMissCount=556608,
blockCacheEvictedCount=0, blockCacheHitRatio=1, blockCacheHitCachingRatio=8

2011-05-11 14:18:31,067 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Replay of HLog
required. Forcing server shutdown

2011-05-11 14:18:31,067 INFO
org.apache.hadoop.hbase.regionserver.MemStoreFlusher:
regionserver60020.cacheFlusher exiting

 

Hbase version is 0.90.2 and Hadoop version is compiled from
branch-0.20-append.

 

Has anyone experienced something similar or has an idea where we can start
looking?

 

Thanks!

 

Chris

 


Re: ArrayIndexOutOfBoundsException in FSOutputSummer.write()

Posted by Stack <st...@duboce.net>.
On Thu, May 12, 2011 at 7:37 AM, Chris Bohme <ch...@pinkmatter.com> wrote:
> When manually browsing to the recovered.edits folder in HDFS and opening
> them with HFile an error is shown: "Trailer header is wrong...."
>

They are not hfiles so yes, you'll see that (they are straight
SequenceFiles IIRC).

> If the edit files mean anything to you, we can post them as well.
>

Yes please.  Can I see
hdfs://eagle1:9000/hbase/LongTable/58e7c587ac3992ed20fc1a457a07ccd9/recovere
d.edits/0000000000000063598

Any errors in the master log around the creation of the above?  You
can grep it in your master log.

Thanks for the info,
St.Ack

> Thanks so far!
>
> Chris
>
>
> -----Original Message-----
> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
> Sent: 11 May 2011 06:56 PM
> To: user@hbase.apache.org
> Subject: Re: ArrayIndexOutOfBoundsException in FSOutputSummer.write()
>
> I have not seen this before.  You are failing because of
> java.lang.ArrayIndexOutOfBoundsException in
> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:83).
> Tell us more about your context.  Are you using compression?  What
> kind of hardware, operating system (I'm trying to figure what is
> different about your setup that would bring on this AIOOFE)?
>
> Thank you,
> St.Ack
>
> On Wed, May 11, 2011 at 6:30 AM, Chris Bohme <ch...@pinkmatter.com> wrote:
>> Dear community,
>>
>>
>>
>> We are doing a test on a 5 node cluster with a table of about 50 million
>> rows (writes and reads). At some point we end up getting the following
>> exception on 2 of the region servers:
>>
>>
>>
>> 2011-05-11 14:18:28,660 INFO org.apache.hadoop.hbase.regionserver.Store:
>> Started compaction of 3 file(s) in cf=Family1  into
>> hdfs://eagle1:9000/hbase/LongTable/167e7b292cc45b9face9a9cb7d86384c/.tmp,
>> seqid=66246, totalSize=64.2m
>>
>> 2011-05-11 14:18:28,661 DEBUG org.apache.hadoop.hbase.regionserver.Store:
>> Compacting
>>
> hdfs://eagle1:9000/hbase/LongTable/167e7b292cc45b9face9a9cb7d86384c/Family1/
>> 7884224173883345569, keycount=790840, bloomtype=NONE, size=38.5m
>>
>> 2011-05-11 14:18:28,661 DEBUG org.apache.hadoop.hbase.regionserver.Store:
>> Compacting
>>
> hdfs://eagle1:9000/hbase/LongTable/167e7b292cc45b9face9a9cb7d86384c/Family1/
>> 5160949580594728531, keycount=263370, bloomtype=NONE, size=12.8m
>>
>> 2011-05-11 14:18:28,661 DEBUG org.apache.hadoop.hbase.regionserver.Store:
>> Compacting
>>
> hdfs://eagle1:9000/hbase/LongTable/167e7b292cc45b9face9a9cb7d86384c/Family1/
>> 7505588204602186903, keycount=263900, bloomtype=NONE, size=12.8m
>>
>> 2011-05-11 14:18:30,011 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion:
>> Flush requested on
>>
> LongTable,\x00\x00\x00\x00\x01\xC9\xD5\x13,1305115816217.20a05ebff2597ae6a63
>> e31a5e57602dc.
>>
>> 2011-05-11 14:18:30,011 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion:
>> Started memstore flush for
>>
> LongTable,\x00\x00\x00\x00\x01\xC9\xD5\x13,1305115816217.20a05ebff2597ae6a63
>> e31a5e57602dc., current region memstore size 64.2m
>>
>> 2011-05-11 14:18:30,011 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion:
>> Finished snapshotting, commencing flushing stores
>>
>> 2011-05-11 14:18:31,067 FATAL
>> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
>> serverName=eagle5.pinkmatter.local,60020,1305111886513,
>> load=(requests=20457, regions=11, usedHeap=934, maxHeap=4087): Replay of
>> HLog required. Forcing server shutdown
>>
>> org.apache.hadoop.hbase.DroppedSnapshotException: region:
>>
> LongTable,\x00\x00\x00\x00\x01\xC9\xD5\x13,1305115816217.20a05ebff2597ae6a63
>> e31a5e57602dc.
>>
>>       at
>>
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java
>> :995)
>>
>>       at
>>
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java
>> :900)
>>
>>       at
>> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:852)
>>
>>       at
>>
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlu
>> sher.java:392)
>>
>>       at
>>
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlu
>> sher.java:366)
>>
>>       at
>>
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.jav
>> a:240)
>>
>> Caused by: java.lang.ArrayIndexOutOfBoundsException
>>
>>       at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:83)
>>
>>       at
>>
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStre
>> am.java:49)
>>
>>       at java.io.DataOutputStream.write(DataOutputStream.java:90)
>>
>>       at java.io.DataOutputStream.write(DataOutputStream.java:90)
>>
>>       at
>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:544)
>>
>>       at
>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501)
>>
>>       at
>>
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:
>> 836)
>>
>>       at
>>
> org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:479
>> )
>>
>>       at
>> org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:448)
>>
>>       at
>> org.apache.hadoop.hbase.regionserver.Store.access$100(Store.java:81)
>>
>>       at
>>
> org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store
>> .java:1513)
>>
>>       at
>>
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java
>> :973)
>>
>>       ... 5 more
>>
>> 2011-05-11 14:18:31,067 INFO
>> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
>> request=4233.9, regions=11, stores=22, storefiles=48,
> storefileIndexSize=8,
>> memstoreSize=483, compactionQueueSize=0, flushQueueSize=0, usedHeap=941,
>> maxHeap=4087, blockCacheSize=412883432, blockCacheFree=444366808,
>> blockCacheCount=6172, blockCacheHitCount=6181, blockCacheMissCount=556608,
>> blockCacheEvictedCount=0, blockCacheHitRatio=1,
> blockCacheHitCachingRatio=8
>>
>> 2011-05-11 14:18:31,067 INFO
>> org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Replay of
> HLog
>> required. Forcing server shutdown
>>
>> 2011-05-11 14:18:31,067 INFO
>> org.apache.hadoop.hbase.regionserver.MemStoreFlusher:
>> regionserver60020.cacheFlusher exiting
>>
>>
>>
>> Hbase version is 0.90.2 and Hadoop version is compiled from
>> branch-0.20-append.
>>
>>
>>
>> Has anyone experienced something similar or has an idea where we can start
>> looking?
>>
>>
>>
>> Thanks!
>>
>>
>>
>> Chris
>>
>>
>>
>>
>
>

RE: ArrayIndexOutOfBoundsException in FSOutputSummer.write()

Posted by Chris Bohme <ch...@pinkmatter.com>.
1 master 
4 region servers
3 zookeepers (1 on master, 2 on the region server nodes)
all running Ubuntu 10.10 with Hbase 0.90.2 and Hadoop branch-0.20-append

We're running a performance test by creating a long table with 2 families
and 10 columns. All the inserted values are random longs. This test is run
from a single client. Alternating writing and reading is performed. All goes
well until about 50 million rows at which point the cluster fails with 2 of
the region servers being shut down due to the mentioned
ArrayIndexOutOfBoundsException. 

When Hbase is restarted and the edits get replayed the same exception is
thrown again:

2011-05-11 19:32:00,346 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Replaying edits from
hdfs://eagle1:9000/hbase/LongTable/58e7c587ac3992ed20fc1a457a07ccd9/recovere
d.edits/0000000000000063598; minSequenceid=64118
2011-05-11 19:32:00,989 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Started memstore flush for
LongTable,\x00\x00\x00\x00\x05\xA9\xA4\xB5,1305115670639.58e7c587ac3992ed20f
c1a457a07ccd9., current region memstore size 64.2m; wal is null, using
passed sequenceid=66412
2011-05-11 19:32:00,989 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Finished snapshotting, commencing flushing stores
2011-05-11 19:32:01,179 ERROR
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open
of
region=LongTable,\x00\x00\x00\x00\x05\xA9\xA4\xB5,1305115670639.58e7c587ac39
92ed20fc1a457a07ccd9.
org.apache.hadoop.hbase.DroppedSnapshotException: region:
LongTable,\x00\x00\x00\x00\x05\xA9\xA4\xB5,1305115670639.58e7c587ac3992ed20f
c1a457a07ccd9.
	at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java
:995)
	at
org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.ja
va:1950)
	at
org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegi
on.java:1833)
	at
org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:354)
	at
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2551)
	at
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2537)
	at
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(Op
enRegionHandler.java:266)
	at
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenR
egionHandler.java:98)
	at
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
	at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
va:886)
	at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
08)
	at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.ArrayIndexOutOfBoundsException
	at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:83)
	at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStre
am.java:49)
	at java.io.DataOutputStream.write(DataOutputStream.java:90)
	at java.io.DataOutputStream.write(DataOutputStream.java:90)
	at
org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:544)
	at
org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501)
	at
org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:
836)
	at
org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:479
)
	at
org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:448)
	at
org.apache.hadoop.hbase.regionserver.Store.access$100(Store.java:81)
	at
org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store
.java:1513)
	at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java
:973)
	... 11 more 
 

When manually browsing to the recovered.edits folder in HDFS and opening
them with HFile an error is shown: "Trailer header is wrong...."

If the edit files mean anything to you, we can post them as well.

Thanks so far!

Chris


-----Original Message-----
From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
Sent: 11 May 2011 06:56 PM
To: user@hbase.apache.org
Subject: Re: ArrayIndexOutOfBoundsException in FSOutputSummer.write()

I have not seen this before.  You are failing because of
java.lang.ArrayIndexOutOfBoundsException in
org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:83).
Tell us more about your context.  Are you using compression?  What
kind of hardware, operating system (I'm trying to figure what is
different about your setup that would bring on this AIOOFE)?

Thank you,
St.Ack

On Wed, May 11, 2011 at 6:30 AM, Chris Bohme <ch...@pinkmatter.com> wrote:
> Dear community,
>
>
>
> We are doing a test on a 5 node cluster with a table of about 50 million
> rows (writes and reads). At some point we end up getting the following
> exception on 2 of the region servers:
>
>
>
> 2011-05-11 14:18:28,660 INFO org.apache.hadoop.hbase.regionserver.Store:
> Started compaction of 3 file(s) in cf=Family1  into
> hdfs://eagle1:9000/hbase/LongTable/167e7b292cc45b9face9a9cb7d86384c/.tmp,
> seqid=66246, totalSize=64.2m
>
> 2011-05-11 14:18:28,661 DEBUG org.apache.hadoop.hbase.regionserver.Store:
> Compacting
>
hdfs://eagle1:9000/hbase/LongTable/167e7b292cc45b9face9a9cb7d86384c/Family1/
> 7884224173883345569, keycount=790840, bloomtype=NONE, size=38.5m
>
> 2011-05-11 14:18:28,661 DEBUG org.apache.hadoop.hbase.regionserver.Store:
> Compacting
>
hdfs://eagle1:9000/hbase/LongTable/167e7b292cc45b9face9a9cb7d86384c/Family1/
> 5160949580594728531, keycount=263370, bloomtype=NONE, size=12.8m
>
> 2011-05-11 14:18:28,661 DEBUG org.apache.hadoop.hbase.regionserver.Store:
> Compacting
>
hdfs://eagle1:9000/hbase/LongTable/167e7b292cc45b9face9a9cb7d86384c/Family1/
> 7505588204602186903, keycount=263900, bloomtype=NONE, size=12.8m
>
> 2011-05-11 14:18:30,011 DEBUG
org.apache.hadoop.hbase.regionserver.HRegion:
> Flush requested on
>
LongTable,\x00\x00\x00\x00\x01\xC9\xD5\x13,1305115816217.20a05ebff2597ae6a63
> e31a5e57602dc.
>
> 2011-05-11 14:18:30,011 DEBUG
org.apache.hadoop.hbase.regionserver.HRegion:
> Started memstore flush for
>
LongTable,\x00\x00\x00\x00\x01\xC9\xD5\x13,1305115816217.20a05ebff2597ae6a63
> e31a5e57602dc., current region memstore size 64.2m
>
> 2011-05-11 14:18:30,011 DEBUG
org.apache.hadoop.hbase.regionserver.HRegion:
> Finished snapshotting, commencing flushing stores
>
> 2011-05-11 14:18:31,067 FATAL
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
> serverName=eagle5.pinkmatter.local,60020,1305111886513,
> load=(requests=20457, regions=11, usedHeap=934, maxHeap=4087): Replay of
> HLog required. Forcing server shutdown
>
> org.apache.hadoop.hbase.DroppedSnapshotException: region:
>
LongTable,\x00\x00\x00\x00\x01\xC9\xD5\x13,1305115816217.20a05ebff2597ae6a63
> e31a5e57602dc.
>
>       at
>
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java
> :995)
>
>       at
>
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java
> :900)
>
>       at
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:852)
>
>       at
>
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlu
> sher.java:392)
>
>       at
>
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlu
> sher.java:366)
>
>       at
>
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.jav
> a:240)
>
> Caused by: java.lang.ArrayIndexOutOfBoundsException
>
>       at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:83)
>
>       at
>
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStre
> am.java:49)
>
>       at java.io.DataOutputStream.write(DataOutputStream.java:90)
>
>       at java.io.DataOutputStream.write(DataOutputStream.java:90)
>
>       at
> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:544)
>
>       at
> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501)
>
>       at
>
org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:
> 836)
>
>       at
>
org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:479
> )
>
>       at
> org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:448)
>
>       at
> org.apache.hadoop.hbase.regionserver.Store.access$100(Store.java:81)
>
>       at
>
org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store
> .java:1513)
>
>       at
>
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java
> :973)
>
>       ... 5 more
>
> 2011-05-11 14:18:31,067 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
> request=4233.9, regions=11, stores=22, storefiles=48,
storefileIndexSize=8,
> memstoreSize=483, compactionQueueSize=0, flushQueueSize=0, usedHeap=941,
> maxHeap=4087, blockCacheSize=412883432, blockCacheFree=444366808,
> blockCacheCount=6172, blockCacheHitCount=6181, blockCacheMissCount=556608,
> blockCacheEvictedCount=0, blockCacheHitRatio=1,
blockCacheHitCachingRatio=8
>
> 2011-05-11 14:18:31,067 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Replay of
HLog
> required. Forcing server shutdown
>
> 2011-05-11 14:18:31,067 INFO
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher:
> regionserver60020.cacheFlusher exiting
>
>
>
> Hbase version is 0.90.2 and Hadoop version is compiled from
> branch-0.20-append.
>
>
>
> Has anyone experienced something similar or has an idea where we can start
> looking?
>
>
>
> Thanks!
>
>
>
> Chris
>
>
>
>


Re: ArrayIndexOutOfBoundsException in FSOutputSummer.write()

Posted by Stack <st...@duboce.net>.
I have not seen this before.  You are failing because of
java.lang.ArrayIndexOutOfBoundsException in
org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:83).
Tell us more about your context.  Are you using compression?  What
kind of hardware, operating system (I'm trying to figure what is
different about your setup that would bring on this AIOOFE)?

Thank you,
St.Ack

On Wed, May 11, 2011 at 6:30 AM, Chris Bohme <ch...@pinkmatter.com> wrote:
> Dear community,
>
>
>
> We are doing a test on a 5 node cluster with a table of about 50 million
> rows (writes and reads). At some point we end up getting the following
> exception on 2 of the region servers:
>
>
>
> 2011-05-11 14:18:28,660 INFO org.apache.hadoop.hbase.regionserver.Store:
> Started compaction of 3 file(s) in cf=Family1  into
> hdfs://eagle1:9000/hbase/LongTable/167e7b292cc45b9face9a9cb7d86384c/.tmp,
> seqid=66246, totalSize=64.2m
>
> 2011-05-11 14:18:28,661 DEBUG org.apache.hadoop.hbase.regionserver.Store:
> Compacting
> hdfs://eagle1:9000/hbase/LongTable/167e7b292cc45b9face9a9cb7d86384c/Family1/
> 7884224173883345569, keycount=790840, bloomtype=NONE, size=38.5m
>
> 2011-05-11 14:18:28,661 DEBUG org.apache.hadoop.hbase.regionserver.Store:
> Compacting
> hdfs://eagle1:9000/hbase/LongTable/167e7b292cc45b9face9a9cb7d86384c/Family1/
> 5160949580594728531, keycount=263370, bloomtype=NONE, size=12.8m
>
> 2011-05-11 14:18:28,661 DEBUG org.apache.hadoop.hbase.regionserver.Store:
> Compacting
> hdfs://eagle1:9000/hbase/LongTable/167e7b292cc45b9face9a9cb7d86384c/Family1/
> 7505588204602186903, keycount=263900, bloomtype=NONE, size=12.8m
>
> 2011-05-11 14:18:30,011 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
> Flush requested on
> LongTable,\x00\x00\x00\x00\x01\xC9\xD5\x13,1305115816217.20a05ebff2597ae6a63
> e31a5e57602dc.
>
> 2011-05-11 14:18:30,011 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
> Started memstore flush for
> LongTable,\x00\x00\x00\x00\x01\xC9\xD5\x13,1305115816217.20a05ebff2597ae6a63
> e31a5e57602dc., current region memstore size 64.2m
>
> 2011-05-11 14:18:30,011 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
> Finished snapshotting, commencing flushing stores
>
> 2011-05-11 14:18:31,067 FATAL
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
> serverName=eagle5.pinkmatter.local,60020,1305111886513,
> load=(requests=20457, regions=11, usedHeap=934, maxHeap=4087): Replay of
> HLog required. Forcing server shutdown
>
> org.apache.hadoop.hbase.DroppedSnapshotException: region:
> LongTable,\x00\x00\x00\x00\x01\xC9\xD5\x13,1305115816217.20a05ebff2597ae6a63
> e31a5e57602dc.
>
>       at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java
> :995)
>
>       at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java
> :900)
>
>       at
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:852)
>
>       at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlu
> sher.java:392)
>
>       at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlu
> sher.java:366)
>
>       at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.jav
> a:240)
>
> Caused by: java.lang.ArrayIndexOutOfBoundsException
>
>       at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:83)
>
>       at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStre
> am.java:49)
>
>       at java.io.DataOutputStream.write(DataOutputStream.java:90)
>
>       at java.io.DataOutputStream.write(DataOutputStream.java:90)
>
>       at
> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:544)
>
>       at
> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501)
>
>       at
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:
> 836)
>
>       at
> org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:479
> )
>
>       at
> org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:448)
>
>       at
> org.apache.hadoop.hbase.regionserver.Store.access$100(Store.java:81)
>
>       at
> org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store
> .java:1513)
>
>       at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java
> :973)
>
>       ... 5 more
>
> 2011-05-11 14:18:31,067 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
> request=4233.9, regions=11, stores=22, storefiles=48, storefileIndexSize=8,
> memstoreSize=483, compactionQueueSize=0, flushQueueSize=0, usedHeap=941,
> maxHeap=4087, blockCacheSize=412883432, blockCacheFree=444366808,
> blockCacheCount=6172, blockCacheHitCount=6181, blockCacheMissCount=556608,
> blockCacheEvictedCount=0, blockCacheHitRatio=1, blockCacheHitCachingRatio=8
>
> 2011-05-11 14:18:31,067 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Replay of HLog
> required. Forcing server shutdown
>
> 2011-05-11 14:18:31,067 INFO
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher:
> regionserver60020.cacheFlusher exiting
>
>
>
> Hbase version is 0.90.2 and Hadoop version is compiled from
> branch-0.20-append.
>
>
>
> Has anyone experienced something similar or has an idea where we can start
> looking?
>
>
>
> Thanks!
>
>
>
> Chris
>
>
>
>