You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "zhuobin zheng (Jira)" <ji...@apache.org> on 2021/11/18 18:57:00 UTC

[jira] [Work started] (HBASE-26467) Wrong Cell Generated by MemStoreLABImpl.forceCopyOfBigCellInto when Cell size bigger than data chunk size

     [ https://issues.apache.org/jira/browse/HBASE-26467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on HBASE-26467 started by zhuobin zheng.
---------------------------------------------
> Wrong Cell Generated by MemStoreLABImpl.forceCopyOfBigCellInto when Cell size bigger than data chunk size 
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-26467
>                 URL: https://issues.apache.org/jira/browse/HBASE-26467
>             Project: HBase
>          Issue Type: Bug
>            Reporter: zhuobin zheng
>            Assignee: zhuobin zheng
>            Priority: Critical
>
> In our company 2.X cluster. I found some region compaction keeps failling because some cell can't construct succefully. In fact , we even can't read these cell.
> From follow stack , we can found the bug cause KeyValue can't constructed.
> Simple Log and Stack: 
> {code:java}
> // code placeholder
> 2021-11-18 16:50:47,708 ERROR [regionserver/xxxx:60020-longCompactions-4] regionserver.CompactSplit: Compaction failed region=xx_table,3610ff49595a0fc4a824f2a575f37a31,1570874723992.dac703ceb35e8d8703233bebf34ae49f., storeName=c, priority=-319, startTime=1637225447127 
> java.lang.IllegalArgumentException: Invalid tag length at position=4659867, tagLength=0,         
> at org.apache.hadoop.hbase.KeyValueUtil.checkKeyValueTagBytes(KeyValueUtil.java:685)
>         at org.apache.hadoop.hbase.KeyValueUtil.checkKeyValueBytes(KeyValueUtil.java:643)
>         at org.apache.hadoop.hbase.KeyValue.<init>(KeyValue.java:345)
>         at org.apache.hadoop.hbase.SizeCachedKeyValue.<init>(SizeCachedKeyValue.java:43)
>         at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.getCell(HFileReaderImpl.java:981)
>         at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:233)
>         at org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:418)
>         at org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:322)
>         at org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:288)
>         at org.apache.hadoop.hbase.regionserver.compactions.Compactor.createScanner(Compactor.java:487)
>         at org.apache.hadoop.hbase.regionserver.compactions.Compactor$1.createScanner(Compactor.java:248)
>         at org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Compactor.java:318)
>         at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:65)
>         at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:126)
>         at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1468)
>         at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:2266)
>         at org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.doCompaction(CompactSplit.java:624)
>         at org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.run(CompactSplit.java:666)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:748) {code}
> From further observation, I found the following characteristics:
>  # Cell size more than 2M
>  # We can reproduce the bug only after in memory compact
>  # Cell bytes end with \x00\x02\x00\x00
>  
> In fact, the root reason is method (MemStoreLABImpl.forceCopyOfBigCellInto) which only invoked when cell bigger than data chunk size construct cell with wrong length.  So there are 4 bytes (chunk head size) append end of the cell bytes.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)