You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "ranpanfeng (Jira)" <ji...@apache.org> on 2019/09/27 08:25:00 UTC

[jira] [Created] (HBASE-23084) Assertion "Added a key not lexically larger than previous" or "prevKey followed by a smaller key" caused abortion of RS

ranpanfeng created HBASE-23084:
----------------------------------

             Summary: Assertion "Added a key not lexically larger than previous" or  "prevKey  followed by a smaller key"  caused abortion of RS
                 Key: HBASE-23084
                 URL: https://issues.apache.org/jira/browse/HBASE-23084
             Project: HBase
          Issue Type: Bug
          Components: regionserver
         Environment: We store many phoenix tables in hbase, only one table trigger this issues.

When two jacent keys in wrong order, the observation as follows.

1. the prevKey is written into  hbase via Put rpc

2. the current key via phoenix upsert statment( Multi rpc in underlying hbase)

3. the current Cell extends 1MB.

4. the error is easy to re-ocurr.
            Reporter: ranpanfeng


when auto flush and manual flush is invoked,  cell in immutable memstore segment is scanned one by one to build data block and bloom filter block, during this time, two adjacent keys are compared to assert that they are in lexicographical order, but regretfully, the assertion fails in some scenarios, when RS encounter these assertion failure, always call abort function to terminate abnormally.

 

the version is 0ba7cc01a13dbb7fec8e6c5d599fc8b4936bae61  tag: rel/2.1.5

 

At least three cases are found.

*error case#1:  getQualifierArray throws java.lang.ArrayIndexOutOfBoundsException*

 
{code:java}
17:57:50.328 [MemStoreFlusher.0] ERROR org.apache.hadoop.hbase.regionserver.HRegionServer - ***** ABORTING region server ${regionserver}: Replay of WAL required. Forcing server shutdown *****
org.apache.hadoop.hbase.DroppedSnapshotException: region: ${region}.
        at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2788) [hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2467) [hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2439) [hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2329) [hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:612) [hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:581) [hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:68) [hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:361) [hbase-server-2.1.5.jar:2.1.5]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
Caused by: java.lang.ArrayIndexOutOfBoundsException
        at org.apache.hadoop.hbase.util.ByteBufferUtils.copyFromBufferToArray(ByteBufferUtils.java:1155) ~[hbase-common-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.CellUtil.copyQualifierTo(CellUtil.java:309) ~[hbase-common-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.CellUtil.cloneQualifier(CellUtil.java:121) ~[hbase-common-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.ByteBufferKeyValue.getQualifierArray(ByteBufferKeyValue.java:112) ~[hbase-common-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.CellUtil.getCellKeyAsString(CellUtil.java:1306) ~[hbase-common-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.CellUtil.toString(CellUtil.java:1483) ~[hbase-common-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.ByteBufferKeyValue.toString(ByteBufferKeyValue.java:296) ~[hbase-common-2.1.5.jar:2.1.5]
        at java.lang.String.valueOf(String.java:2994) ~[?:1.8.0_144]
        at java.lang.StringBuilder.append(StringBuilder.java:131) ~[?:1.8.0_144]
        at org.apache.hadoop.hbase.regionserver.StoreScanner.checkScanOrder(StoreScanner.java:1007) ~[hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:582) ~[hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:127) ~[hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:69) ~[hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1021) ~[hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2259) ~[hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2733) ~[hbase-server-2.1.5.jar:2.1.5]
        ... 8 more
17:57:50.332 [MemStoreFlusher.0] ERROR org.apache.hadoop.hbase.regionserver.HRegionServer - RegionServer abort: loaded coprocessors are: [org.apache.phoenix.coprocessor.ScanRegionObserver, org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver, org.apache.phoenix.hbase.index.Indexer, org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver, org.apache.phoenix.coprocessor.ServerCachingEndpointImpl]
  
{code}
 

 

*error case#2:  StoreScanner.checkScanOrder triggers "prevKey followed by a smaller key"*

 
{code:java}
Exception in thread "RS_IN_MEMORY_COMPACTION-regionserver/${regionserver}" java.lang.AssertionError: Key ${prevKey} followed by a smaller key ${key}
        at org.apache.hadoop.hbase.regionserver.StoreScanner.checkScanOrder(StoreScanner.java:1007)
        at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:582)
        at org.apache.hadoop.hbase.regionserver.MemStoreCompactorSegmentsIterator.refillKVS(MemStoreCompactorSegmentsIterator.java:150)
        at org.apache.hadoop.hbase.regionserver.MemStoreCompactorSegmentsIterator.hasNext(MemStoreCompactorSegmentsIterator.java:72)
        at org.apache.hadoop.hbase.regionserver.CellChunkImmutableSegment.initializeCellSet(CellChunkImmutableSegment.java:146)
        at org.apache.hadoop.hbase.regionserver.CellChunkImmutableSegment.<init>(CellChunkImmutableSegment.java:66)
        at org.apache.hadoop.hbase.regionserver.SegmentFactory.createImmutableSegment(SegmentFactory.java:133)
        at org.apache.hadoop.hbase.regionserver.SegmentFactory.createImmutableSegmentByCompaction(SegmentFactory.java:59)
        at org.apache.hadoop.hbase.regionserver.MemStoreCompactor.createSubstitution(MemStoreCompactor.java:217)
        at org.apache.hadoop.hbase.regionserver.MemStoreCompactor.doCompaction(MemStoreCompactor.java:171)
        at org.apache.hadoop.hbase.regionserver.MemStoreCompactor.start(MemStoreCompactor.java:109)
        at org.apache.hadoop.hbase.regionserver.CompactingMemStore.flushInMemory(CompactingMemStore.java:422)
        at org.apache.hadoop.hbase.regionserver.CompactingMemStore$InMemoryFlushRunnable.run(CompactingMemStore.java:534)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
        at java.lang.Thread.run(Thread.java:748)
12:34:07.695 [MemStoreFlusher.0] INFO  org.apache.hadoop.hbase.regionserver.HRegion - Flushing 1/1 column families, dataSize=239.26 MB heapSize=256.00 MB
12:34:08.476 [MemStoreFlusher.0] ERROR org.apache.hadoop.hbase.regionserver.HRegionServer - ***** ABORTING region server ${regionserver}: Replay of WAL required. Forcing server shutdown *****
org.apache.hadoop.hbase.DroppedSnapshotException: region: ${region}.
        at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2788) [hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2467) [hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2439) [hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2329) [hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:612) [hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:581) [hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:68) [hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:361) [hbase-server-2.1.5.jar:2.1.5]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
Caused by: java.lang.AssertionError: Key ${prevKey} followed by a smaller key ${key}
        at org.apache.hadoop.hbase.regionserver.StoreScanner.checkScanOrder(StoreScanner.java:1007) ~[hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:582) ~[hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:127) ~[hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:69) ~[hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1021) ~[hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2259) ~[hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2733) ~[hbase-server-2.1.5.jar:2.1.5]
        ... 8 more


{code}
 

 

 

*error case#3: BloomContext.sanityCheck triggers "Added a key not lexically larger than previous."* 

so, I alter the assertion code into if stmt
{code:java}
--- a/hbase-common/src/main/java/org/apache/hadoop/hbase/util/ByteBufferUtils.java
+++ b/hbase-common/src/main/java/org/apache/hadoop/hbase/util/ByteBufferUtils.java
@@ -33,6 +33,8 @@ import org.apache.hadoop.hbase.io.util.StreamUtils;
 import org.apache.hadoop.io.IOUtils;
 import org.apache.hadoop.io.WritableUtils;
 import org.apache.yetus.audience.InterfaceAudience;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 import sun.nio.ch.DirectBuffer;
 
 import org.apache.hbase.thirdparty.com.google.common.annotations.VisibleForTesting;
@@ -51,6 +53,7 @@ public final class ByteBufferUtils {
   @VisibleForTesting
   final static boolean UNSAFE_AVAIL = UnsafeAvailChecker.isAvailable();
   public final static boolean UNSAFE_UNALIGNED = UnsafeAvailChecker.unaligned();
+  private static Logger LOG = LoggerFactory.getLogger(ByteBufferUtils.class);
 
   private ByteBufferUtils() {
   }
@@ -1152,7 +1155,18 @@ public final class ByteBufferUtils {
   public static void copyFromBufferToArray(byte[] out, ByteBuffer in, int sourceOffset,
       int destinationOffset, int length) {
     if (in.hasArray()) {
-      System.arraycopy(in.array(), sourceOffset + in.arrayOffset(), out, destinationOffset, length);
+      try {
+        System.arraycopy(in.array(), sourceOffset + in.arrayOffset(), out, destinationOffset, length);
+      } catch (Throwable e){
+        if (out == null) {
+          LOG.warn("[QINGGUI] out=null");
+        } else if (in == null || in.array() == null) {
+          LOG.warn("[QINGGUI] in=null");
+        } else {
+          LOG.warn("[QINGGUI] out.length={}, in.length={}, sourceOffset={} destinationOffset={} length={}",
+            out.length, in.array().length, sourceOffset, destinationOffset, length);
+        }
+      }
     } else if (UNSAFE_AVAIL) {
       UnsafeAccess.copy(in, sourceOffset, out, destinationOffset, length);
     } else {

{code}
 

re-deploy my binary, then the failure drifts to bloom filter building code.
{code:java}
12:54:39.966 [rs(${regionserver})-flush-proc-pool3-thread-2] WARN  org.apache.hadoop.hbase.regionserver.HStore - Failed flushing store file, retrying num=9
java.io.IOException: Added a key not lexically larger than previous. Current cell ...
        at org.apache.hadoop.hbase.util.BloomContext.sanityCheck(BloomContext.java:63) ~[hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.util.BloomContext.writeBloom(BloomContext.java:54) ~[hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.StoreFileWriter.appendGeneralBloomfilter(StoreFileWriter.java:213) ~[hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.StoreFileWriter.append(StoreFileWriter.java:232) ~[hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:133) ~[hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:69) ~[hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1021) [hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2259) [hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2733) [hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2467) [hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2439) [hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2329) [hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2232) [hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.procedure.flush.FlushTableSubprocedure$RegionFlushTask.call(FlushTableSubprocedure.java:68) [hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.procedure.flush.FlushTableSubprocedure$RegionFlushTask.call(FlushTableSubprocedure.java:56) [hbase-server-2.1.5.jar:2.1.5]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_144]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_144]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_144]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_144]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_144]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
12:54:39.979 [rs(${regionserver})-flush-proc-pool3-thread-2] ERROR org.apache.hadoop.hbase.regionserver.HRegionServer - ***** ABORTING region server ${regionserver}: Replay of WAL required. Forcing server shutdown *****
org.apache.hadoop.hbase.DroppedSnapshotException: region: ${region}.
        at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2788) [hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2467) [hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2439) [hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2329) [hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2232) [hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.procedure.flush.FlushTableSubprocedure$RegionFlushTask.call(FlushTableSubprocedure.java:68) [hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.procedure.flush.FlushTableSubprocedure$RegionFlushTask.call(FlushTableSubprocedure.java:56) [hbase-server-2.1.5.jar:2.1.5]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_144]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_144]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_144]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_144]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_144]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
Caused by: java.io.IOException: Added a key not lexically larger than previous. Current cell = ...
        at org.apache.hadoop.hbase.util.BloomContext.sanityCheck(BloomContext.java:63) ~[hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.util.BloomContext.writeBloom(BloomContext.java:54) ~[hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.StoreFileWriter.appendGeneralBloomfilter(StoreFileWriter.java:213) ~[hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.StoreFileWriter.append(StoreFileWriter.java:232) ~[hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:133) ~[hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:69) ~[hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1021) ~[hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2259) ~[hbase-server-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2733) ~[hbase-server-2.1.5.jar:2.1.5]
        ... 12 more

{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)