You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Zheng Hu (JIRA)" <ji...@apache.org> on 2018/11/09 02:07:00 UTC

[jira] [Commented] (HBASE-21379) RegionServer Stop by ArrayIndexOutOfBoundsException of WAL when replication enabled

    [ https://issues.apache.org/jira/browse/HBASE-21379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16680735#comment-16680735 ] 

Zheng Hu commented on HBASE-21379:
----------------------------------

Talked with justice,   after applied HBASE-21401 into their test cluster, the ArrayIndexOutOfBoundsException still happened.  So it seems that not the client mess up the bytes, we can basically determine that the region server messed up the bytes in some scenarios.
Still need more time to dig into this. 

> RegionServer Stop by ArrayIndexOutOfBoundsException of WAL when replication enabled
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-21379
>                 URL: https://issues.apache.org/jira/browse/HBASE-21379
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 2.0.0
>            Reporter: justice
>            Assignee: Zheng Hu
>            Priority: Major
>         Attachments: hbase-wal-p.tgz, log.tgz, wal-20181026.tgz, wal.tgz
>
>
>  log as follow:
> {code:java}
> //代码占位符
> 2018-10-24 09:22:42,381 INFO  [regionserver/11-3-19-10:16020] wal.AbstractFSWAL: New WAL /hbase/WALs/11-3-19-10.jd.local,16020,1540344155469/11-3-19-10.jd.local%2C16020%2C1540344155469.1540344162124         │
> 2018-10-24 09:23:05,151 ERROR [regionserver/11-3-19-10:16020.replicationSource.11-3-19-10.jd.local%2C16020%2C1540344155469,2.replicationSource.wal-reader.11-3-19-10.jd.local%2C16020%2C1540344155469,2] region│
> server.ReplicationSource: Unexpected exception in regionserver/11-3-19-10:16020.replicationSource.11-3-19-10.jd.local%2C16020%2C1540344155469,2.replicationSource.wal-reader.11-3-19-10.jd.local%2C16020%2C1540│
> 344155469,2 currentPath=hdfs://11-3-18-67.JD.LOCAL:9000/hbase/WALs/11-3-19-10.jd.local,16020,1540344155469/11-3-19-10.jd.local%2C16020%2C1540344155469.1540344162124 │
> java.lang.ArrayIndexOutOfBoundsException: 8830 │
> at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1365) │
> at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1358) ┤
> at org.apache.hadoop.hbase.CellUtil.cloneFamily(CellUtil.java:114) │
> at org.apache.hadoop.hbase.replication.ScopeWALEntryFilter.filterCell(ScopeWALEntryFilter.java:54) │
> at org.apache.hadoop.hbase.replication.ChainWALEntryFilter.filterCells(ChainWALEntryFilter.java:90) │
> at org.apache.hadoop.hbase.replication.ChainWALEntryFilter.filter(ChainWALEntryFilter.java:77) │
> at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.filterEntry(ReplicationSourceWALReader.java:234) │
> at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.readWALEntries(ReplicationSourceWALReader.java:170) │ at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.run(ReplicationSourceWALReader.java:133) │
> 2018-10-24 09:23:05,153 INFO [regionserver/11-3-19-10:16020.replicationSource.11-3-19-10.jd.local%2C16020%2C1540344155469,2.replicationSource.wal-reader.11-3-19-10.jd.local%2C16020%2C1540344155469,2] region│
> server.HRegionServer: ***** STOPPING region server '11-3-19-10.jd.local,16020,1540344155469' *****
> {code}
> hbase wal -p output
> {code:java}
> //代码占位符
> writer Classes: ProtobufLogWriter AsyncProtobufLogWriter
> Cell Codec Class: org.apache.hadoop.hbase.regionserver.wal.WALCellCodec
> Sequence=15 , region=fee7a9465ced6ce9e319d37e9d71c63c at write timestamp=Wed Oct 24 09:22:49 CST 2018
> row=80000000, column=METAFAMILY:HBASE::REGION_EVENT
> value: \x08\x00\x12\x1Cmlaas:ump_host_second_181029\x1A fee7a9465ced6ce9e319d37e9d71c63c \x0E*\x06\x0A\x01f\x12\x01f2\x1F\x0A\x1311-3-19-10.JD.LOCAL\x10\x94}\x18\xCD\x9A\xAA\x9D\xEA,:Umlaas:ump_host_second_181029,80000000,1540271129253.fee7a9465ced6ce9e319d37e9d71c63c.
> Sequence=9 , region=ba6684888d826328a6373435124dc1cd at write timestamp=Wed Oct 24 09:22:49 CST 2018
> row=91000000, column=METAFAMILY:HBASE::REGION_EVENT
> ...
> row=34975#00, column=f:\x09,
> value: {"tp50":1,"avg":2,"min":0,"tp90":1,"max":3,"count":13,"tp99":2,"tp999":2,"error":0}
> row=349824#00, column=f:\x08\xFA
> value: {"tp50":2,"avg":2,"min":0,"tp90":2,"max":98,"count":957,"tp99":3,"tp999":34,"error":0}
> row=349824#00, column=f:\x08\xD2
> value: {"tp50":2,"avg":2,"min":0,"tp90":2,"max":43,"count":1842,"tp99":2,"tp999":31,"error":0}
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 8830
> at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1365)
> at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1358)
> at org.apache.hadoop.hbase.wal.WALPrettyPrinter.toStringMap(WALPrettyPrinter.java:336)
> at org.apache.hadoop.hbase.wal.WALPrettyPrinter.processFile(WALPrettyPrinter.java:290)
> at org.apache.hadoop.hbase.wal.WALPrettyPrinter.run(WALPrettyPrinter.java:421)
> at org.apache.hadoop.hbase.wal.WALPrettyPrinter.main(WALPrettyPrinter.java:356)
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)