You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Anoop Sam John (Jira)" <ji...@apache.org> on 2021/05/19 13:31:00 UTC

[jira] [Commented] (HBASE-24623) SIGSEGV v ~StubRoutines::jbyte_disjoint_arraycopy

    [ https://issues.apache.org/jira/browse/HBASE-24623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347682#comment-17347682 ] 

Anoop Sam John commented on HBASE-24623:
----------------------------------------

So this is not the case of early release of BB (which was the case what Duo mentioned).   That wont cause a JVM crash with issue with memory copy.
I too faced this issue last week.   
I believe this happens when there is heavy memory usage in RS side and lots of GC activity.  I could see the RS memory was >95% 
Now when replication sink side one RS received the data to be replicated in replicateWALEntry() call.  This is received into offheap BB (Netty's as NettyRpcServer is the default.  We wont copy from there to onheap for creating CellScanner right [~zhangduo] ?)
Now the ReplicationSink will act like HBase client and issue table.batch() call for writing the replicated rows.  As part of this, we will create CellBlocks.  This include write of Cells /encode to KVCodec#Encoder .   So here we will have copy of data from offheap to offheap  (Ya the cellblock build will use DBB in RS side).  So here we will use Unsafe memory copy API.   My guess is we might be hitting some JDK bug with this Unsafe copy when there is heavy memory usage and GC activity.
Thoughts?
There was a jira that [~andrew.purtell@gmail.com] did for turning off the usage of Unsafe.  Am not able to remember that Jira id though.  Need to use it in my cluster case and see whether we see the issue.


> SIGSEGV v  ~StubRoutines::jbyte_disjoint_arraycopy
> --------------------------------------------------
>
>                 Key: HBASE-24623
>                 URL: https://issues.apache.org/jira/browse/HBASE-24623
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.3.0
>            Reporter: Michael Stack
>            Priority: Major
>
> In testing, 1% of a decent cluster went down with this seg fault in the vm:
> {code}
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x00007f6659052410, pid=37208, tid=0x00007f3c89453700
> #
> # JRE version: OpenJDK Runtime Environment (8.0_232-b09) (build 1.8.0_232-b09)
> # Java VM: OpenJDK 64-Bit Server VM (25.232-b09 mixed mode linux-amd64 )
> # Problematic frame:
> # v  ~StubRoutines::jbyte_disjoint_arraycopy
> {code}
> Looking in the hs_err log, the crash happens in the same area. Here are a few of the stack traces:
> {code}
> Stack: [0x00007f3c89353000,0x00007f3c89454000],  sp=0x00007f3c89452110,  free space=1020k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
> v  ~StubRoutines::jbyte_disjoint_arraycopy
> J 17674 C2 org.apache.hadoop.hbase.util.ByteBufferUtils.copyFromBufferToArray([BLjava/nio/ByteBuffer;III)V (69 bytes) @ 0x00007f665af000d1 [0x00007f665aefffe0+0xf1]
> J 17732 C1 org.apache.hadoop.hbase.CellUtil.copyQualifierTo(Lorg/apache/hadoop/hbase/Cell;[BI)I (59 bytes) @ 0x00007f665bc440dc [0x00007f665bc43b80+0x55c]
> j  org.apache.hadoop.hbase.CellUtil.cloneQualifier(Lorg/apache/hadoop/hbase/Cell;)[B+12
> J 22278 C2 org.apache.hadoop.hbase.ByteBufferKeyValue.getQualifierArray()[B (5 bytes) @ 0x00007f6659bd4784 [0x00007f6659bd4760+0x24]
> j  org.apache.hadoop.hbase.CellUtil.getCellKeyAsString(Lorg/apache/hadoop/hbase/Cell;Ljava/util/function/Function;)Ljava/lang/String;+97
> j  org.apache.hadoop.hbase.CellUtil.getCellKeyAsString(Lorg/apache/hadoop/hbase/Cell;)Ljava/lang/String;+6
> j  org.apache.hadoop.hbase.CellUtil.toString(Lorg/apache/hadoop/hbase/Cell;Z)Ljava/lang/String;+16
> j  org.apache.hadoop.hbase.ByteBufferKeyValue.toString()Ljava/lang/String;+2
> j  org.apache.hadoop.hbase.client.Mutation.add(Lorg/apache/hadoop/hbase/Cell;)Lorg/apache/hadoop/hbase/client/Mutation;+28
> J 22605 C2 org.apache.hadoop.hbase.client.Put.add(Lorg/apache/hadoop/hbase/Cell;)Lorg/apache/hadoop/hbase/client/Put; (8 bytes) @ 0x00007f665a982a04 [0x00007f665a9829e0+0x24]
> J 22112 C2 org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.toPut(Lorg/apache/hadoop/hbase/shaded/protobuf/generated/ClientProtos$MutationProto;Lorg/apache/hadoop/hbase/CellScanner;)Lorg/apache/hadoop/hbase/client/Put; (910 bytes) @ 0x00007f665c706700 [0x00007f665c706000+0x700]
> J 24084 C2 org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(Lorg/apache/hadoop/hbase/shaded/protobuf/generated/ClientProtos$RegionActionResult$Builder;Lorg/apache/hadoop/hbase/regionserver/HRegion;Lorg/apache/hadoop/hbase/quotas/OperationQuota;Ljava/util/List;Lorg/apache/hadoop/hbase/CellScanner;Lorg/apache/hadoop/hbase/quotas/ActivePolicyEnforcement;Z)V (646 bytes) @ 0x00007f665cc21100 [0x00007f665cc20c80+0x480]
> J 14696 C2 org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(Lorg/apache/hadoop/hbase/regionserver/HRegion;Lorg/apache/hadoop/hbase/quotas/OperationQuota;Lorg/apache/hadoop/hbase/shaded/protobuf/generated/ClientProtos$RegionAction;Lorg/apache/hadoop/hbase/CellScanner;Lorg/apache/hadoop/hbase/shaded/protobuf/generated/ClientProtos$RegionActionResult$Builder;Ljava/util/List;JLorg/apache/hadoop/hbase/regionserver/RSRpcServices$RegionScannersCloseCallBack;Lorg/apache/hadoop/hbase/ipc/RpcCallContext;Lorg/apache/hadoop/hbase/quotas/ActivePolicyEnforcement;)Ljava/util/List; (901 bytes) @ 0x00007f665b722148 [0x00007f665b7218e0+0x868]
> {code}
> Here's another:
> {code}
> Stack: [0x00007edd015e2000,0x00007edd016e3000],  sp=0x00007edd016e11b0,  free space=1020k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
> v  ~StubRoutines::jbyte_disjoint_arraycopy
> J 18255 C2 org.apache.hadoop.hbase.util.ByteBufferUtils.copyFromBufferToArray([BLjava/nio/ByteBuffer;III)V (69 bytes) @ 0x00007f06d2593551 [0x00007f06d2593460+0xf1]
> j  org.apache.hadoop.hbase.PrivateCellUtil.copyTagsTo(Lorg/apache/hadoop/hbase/Cell;[BI)I+31
> j  org.apache.hadoop.hbase.CellUtil.cloneTags(Lorg/apache/hadoop/hbase/Cell;)[B+12
> j  org.apache.hadoop.hbase.ByteBufferKeyValue.getTagsArray()[B+1
> j  org.apache.hadoop.hbase.CellUtil.toString(Lorg/apache/hadoop/hbase/Cell;Z)Ljava/lang/String;+40
> j  org.apache.hadoop.hbase.ByteBufferKeyValue.toString()Ljava/lang/String;+2
> j  org.apache.hadoop.hbase.client.Mutation.add(Lorg/apache/hadoop/hbase/Cell;)Lorg/apache/hadoop/hbase/client/Mutation;+28
> J 24361 C2 org.apache.hadoop.hbase.client.Put.add(Lorg/apache/hadoop/hbase/Cell;)Lorg/apache/hadoop/hbase/client/Put; (8 bytes) @ 0x00007f06d1c04d04 [0x00007f06d1c04ce0+0x24]
> J 24273 C2 org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.toPut(Lorg/apache/hadoop/hbase/shaded/protobuf/generated/ClientProtos$MutationProto;Lorg/apache/hadoop/hbase/CellScanner;)Lorg/apache/hadoop/hbase/client/Put; (910 bytes) @ 0x00007f06d4de48b4 [0x00007f06d4de40e0+0x7d4]
> ...
> {code}
> And hereā€¦
> {code}
> Stack: [0x00007f63d89ba000,0x00007f63d8abb000],  sp=0x00007f63d8ab9170,  free space=1020k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
> v  ~StubRoutines::jbyte_disjoint_arraycopy
> J 22303 C2 org.apache.hadoop.hbase.ByteBufferKeyValue.getQualifierArray()[B (5 bytes) @ 0x00007f8dac8dc067 [0x00007f8dac8dbae0+0x587]
> j  org.apache.hadoop.hbase.CellUtil.getCellKeyAsString(Lorg/apache/hadoop/hbase/Cell;Ljava/util/function/Function;)Ljava/lang/String;+97
> j  org.apache.hadoop.hbase.CellUtil.getCellKeyAsString(Lorg/apache/hadoop/hbase/Cell;)Ljava/lang/String;+6
> j  org.apache.hadoop.hbase.CellUtil.toString(Lorg/apache/hadoop/hbase/Cell;Z)Ljava/lang/String;+16
> j  org.apache.hadoop.hbase.ByteBufferKeyValue.toString()Ljava/lang/String;+2
> j  org.apache.hadoop.hbase.client.Mutation.add(Lorg/apache/hadoop/hbase/Cell;)Lorg/apache/hadoop/hbase/client/Mutation;+28
> j  org.apache.hadoop.hbase.client.Put.add(Lorg/apache/hadoop/hbase/Cell;)Lorg/apache/hadoop/hbase/client/Put;+2
> ....
> {code}
> Its this bit of code....in Mutation...processing a large multi request:
> {code}
>   Mutation add(Cell cell) throws IOException {
>     //Checking that the row of the kv is the same as the mutation
>     // TODO: It is fraught with risk if user pass the wrong row.
>     // Throwing the IllegalArgumentException is more suitable I'd say.
>     if (!CellUtil.matchingRows(cell, this.row)) {
>       throw new WrongRowIOException("The row in " + cell.toString() +
>         " doesn't match the original one " +  Bytes.toStringBinary(this.row));
>     }
> ...
> {code}
> Its the call to 'cell.toString()' seemingly each time.
> Oh, I can't reproduce at least with basic messing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)