You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Esteban Gutierrez (JIRA)" <ji...@apache.org> on 2014/06/11 06:33:01 UTC

[jira] [Commented] (HBASE-11325) Malformed RPC calls can corrupt stores

    [ https://issues.apache.org/jira/browse/HBASE-11325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027398#comment-14027398 ] 

Esteban Gutierrez commented on HBASE-11325:
-------------------------------------------

This is how the RS aborted due this corrupt entry in the memstore:
{code}
14/06/05 18:41:44 FATAL regionserver.HRegionServer: ABORTING region server 172.16.0.101,60020,1402018185865: Unrecoverable exception while closing region t0,,1402015274138.a9b83f7801ce96574aeeb2be048690b8., still finishing close
org.apache.hadoop.hbase.DroppedSnapshotException: region: t0,,1402015274138.a9b83f7801ce96574aeeb2be048690b8.
	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1606)
	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1480)
	at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1009)
	at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:957)
	at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:119)
	at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:724)
Caused by: java.io.IOException: ScanWildcardColumnTracker.checkColumn ran into a column actually smaller than the previous column:
	at org.apache.hadoop.hbase.regionserver.ScanWildcardColumnTracker.checkColumn(ScanWildcardColumnTracker.java:104)
	at org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:357)
	at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:365)
	at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:311)
	at org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:812)
	at org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:746)
	at org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:2348)
	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1581)
{code}

If the malformed RPC Put didn't crash the RS, sometimes it was possible to end with a corrupt HFile:
{code}
4/06/05 19:24:06 ERROR compactions.CompactionRequest: Compaction failed regionName=t0,,1402020343626.25a1ee35a486a512b5b3c18e1c56ba39., storeName=f, fileCount=10, fileSize=6.8k (875.0, 678.0, 678.0, 678.0, 678.0, 712.0, 678.0, 678.0, 678.0, 678.0), priority=-7, time=1402021446164920000
java.lang.ArrayIndexOutOfBoundsException: 274
	at org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:251)
	at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:365)
	at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:311)
	at org.apache.hadoop.hbase.regionserver.Compactor.compact(Compactor.java:184)
	at org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:1081)
	at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1336)
	at org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest.run(CompactionRequest.java:303)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.ut14/06/05 19:24:06 DEBUG master.AssignmentManager: The znode of region t0,,1402020343626.25a1ee35a486a512b5b3c18e1c56ba39. has been deleted.
il.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:724)
{code}

Inspecting the file was not possible after some point:
{code}
K: 2\x01fc\x00\x00\x01F\x86\xE1\xC5\xC9/two\x00:/1402422281673/4/vlen=3/ts=0 V: two
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 264
	at org.apache.hadoop.hbase.util.Bytes.toStringBinary(Bytes.java:387)
	at org.apache.hadoop.hbase.KeyValue.keyToString(KeyValue.java:775)
	at org.apache.hadoop.hbase.KeyValue.toString(KeyValue.java:731)
	at java.lang.String.valueOf(String.java:2826)
	at java.lang.StringBuilder.append(StringBuilder.java:115)
	at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.scanKeysValues(HFilePrettyPrinter.java:269)
	at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.processFile(HFilePrettyPrinter.java:229)
	at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.run(HFilePrettyPrinter.java:189)
	at org.apache.hadoop.hbase.io.hfile.HFile.main(HFile.java:750)
{code}


> Malformed RPC calls can corrupt stores
> --------------------------------------
>
>                 Key: HBASE-11325
>                 URL: https://issues.apache.org/jira/browse/HBASE-11325
>             Project: HBase
>          Issue Type: Bug
>          Components: Client, regionserver
>    Affects Versions: 0.94.20
>            Reporter: Esteban Gutierrez
>
> We noticed in a cluster a Region Server that aborted with a DroppedSnapshotException due an IOException in ScanWildcardColumnTracker when the RS tried to flush the memstore. After further research it was found that a client was sending corrupt RPCs requests to the RS and those corrupt requests ended into the stores causing corruption of the memstore itself and in some cases HFiles.  More details to follow.



--
This message was sent by Atlassian JIRA
(v6.2#6252)