You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Victor Xu (JIRA)" <ji...@apache.org> on 2013/10/06 02:36:41 UTC

[jira] [Resolved] (HBASE-9646) SplitLogWorker throws IndexOutOfBoundsException while reading compressed hlog

     [ https://issues.apache.org/jira/browse/HBASE-9646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Victor Xu resolved HBASE-9646.
------------------------------

      Resolution: Duplicate
    Release Note: 
Same as HBASE-9645.
All caused by the HRegion's updatesLock. 

> SplitLogWorker throws IndexOutOfBoundsException while reading compressed hlog
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-9646
>                 URL: https://issues.apache.org/jira/browse/HBASE-9646
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>    Affects Versions: 0.94.10
>         Environment: Linux 2.6.32-el5.x86_64
>            Reporter: Victor Xu
>
> This case just happened after a regionserver halt.([HBASE-9645|https://issues.apache.org/jira/browse/HBASE-9645])
> As we know, when a RS was down, the master would direct other live RS to split the hlog remains of the dead RS. Then, after the whole splitting, it reassign the regions to other live RS. 
> The case I described here is that one of the hlog file of the dead RS cannot be splitted. This hlog splitting task had been resubmitted by master several times to different RS, but all of them failed.
> I checked one of the RS's regionserver log, it had some read error:
> {noformat}
> 2013-09-19 15:54:53,082 INFO org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker hadoop250.cm6,60020,1378886880990 acquired task /hbase/splitlog/hdfs%3A%2F%2Fhadoopnnvip.cm6%3A9000%2Fhbase%2F.logs
> %2Fhadoop230.cm6%2C60020%2C1378886880970-splitting%2Fhadoop230.cm6%252C60020%252C1378886880970.1379572773730
> 2013-09-19 15:54:53,083 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog: hdfs://hadoopnnvip.cm6:9000/hbase/.logs/hadoop230.cm6,60020,1378886880970-splitting/hadoop230.cm6%2C6002
> 0%2C1378886880970.1379572773730, length=1169560315
> 2013-09-19 15:54:53,083 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: Recovering file hdfs://hadoopnnvip.cm6:9000/hbase/.logs/hadoop230.cm6,60020,1378886880970-splitting/hadoop230.cm6%2C60020%2C137888688
> 0970.1379572773730
> 2013-09-19 15:54:53,084 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: Finished lease recovery attempt for hdfs://hadoopnnvip.cm6:9000/hbase/.logs/hadoop230.cm6,60020,1378886880970-splitting/hadoop230.cm6
> %2C60020%2C1378886880970.1379572773730
> 2013-09-19 15:54:53,166 WARN org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Found existing old edits file. It could be the result of a previous failed split attempt. Deleting hdfs://hadoopnnvip.cm
> 6:9000/hbase/page_content_queue/1af5b36fd3695e18ef85ea6d768a3e45/recovered.edits/0000000080485091467, length=71240
> 2013-09-19 15:54:53,177 WARN org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Found existing old edits file. It could be the result of a previous failed split attempt. Deleting hdfs://hadoopnnvip.cm
> 6:9000/hbase/offline_dashboard_queue/2514b6f61f05a2616f1e3dd893a56da6/recovered.edits/0000000080485091468, length=1242
> 2013-09-19 15:54:53,188 WARN org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Found existing old edits file. It could be the result of a previous failed split attempt. Deleting hdfs://hadoopnnvip.cm
> 6:9000/hbase/taobao_auction_rowkey_queue/45ba2b931f94d4d453977ea16bccda6b/recovered.edits/0000000080485091472, length=704
> 2013-09-19 15:54:53,204 WARN org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Found existing old edits file. It could be the result of a previous failed split attempt. Deleting hdfs://hadoopnnvip.cm
> 6:9000/hbase/shangpin_alt_taoke_doc_queue/aabc3dc0ab1c9c1ead1239b6be4234ed/recovered.edits/0000000080485091471, length=8998
> 2013-09-19 15:54:53,216 WARN org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Found existing old edits file. It could be the result of a previous failed split attempt. Deleting hdfs://hadoopnnvip.cm
> 6:9000/hbase/link_rowkey/eb004b12a0e1d3f467cadb19a53ca29e/recovered.edits/0000000080485091462, length=612
> 2013-09-19 15:54:53,219 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Processed 13 edits across 5 regions threw away edits for 0 regions; log file=hdfs://hadoopnnvip.cm6:9000/hbase/.logs/hadoop230.cm6,60020,1378886880970-splitting/hadoop230.cm6%2C60020%2C1378886880970.1379572773730 is corrupted = false progress failed = false
> 2013-09-19 15:54:53,219 INFO org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker hadoop250.cm6,60020,1378886880990 done with task /hbase/splitlog/hdfs%3A%2F%2Fhadoopnnvip.cm6%3A9000%2Fhbase%2F.logs%2Fhadoop230.cm6%2C60020%2C1378886880970-splitting%2Fhadoop230.cm6%252C60020%252C1378886880970.1379572773730 in 137ms
> 2013-09-19 15:54:53,219 ERROR org.apache.hadoop.hbase.regionserver.SplitLogWorker: unexpected error 
> java.lang.IndexOutOfBoundsException: index (5) must be less than size (5)
>         at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:301)
>         at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:280)
>         at org.apache.hadoop.hbase.regionserver.wal.LRUDictionary$BidirectionalLRUMap.get(LRUDictionary.java:122)
>         at org.apache.hadoop.hbase.regionserver.wal.LRUDictionary$BidirectionalLRUMap.access$000(LRUDictionary.java:69)
>         at org.apache.hadoop.hbase.regionserver.wal.LRUDictionary.getEntry(LRUDictionary.java:40)
>         at org.apache.hadoop.hbase.regionserver.wal.Compressor.readCompressed(Compressor.java:111)
>         at org.apache.hadoop.hbase.regionserver.wal.HLogKey.readFields(HLogKey.java:307)
>         at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2199)
>         at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2239)
>         at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:245)
>         at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:214)
>         at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:813)
>         at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:437)
>         at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:381)
>         at org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:112)
>         at org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:280)
>         at org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:211)
>         at org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:179)
>         at java.lang.Thread.run(Thread.java:662)
> 2013-09-19 15:54:53,219 INFO org.apache.hadoop.hbase.regionserver.SplitLogWorker: SplitLogWorker hadoop250.cm6,60020,1378886880990 exiting
> {noformat}
> I then used the HLog dump tool to analyze the hlog file. Here is what I got:
> {noformat}
> $ hbase org.apache.hadoop.hbase.regionserver.wal.HLog --dump /user/hadoop/hbase_admin/logs/hadoop230.cm6%2C60020%2C1378886880970.1379572773730
> 13/09/24 14:24:38 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
> Sequence 80485091459 from region aabc3dc0ab1c9c1ead1239b6be4234ed in table shangpin_alt_taoke_doc_queue
>   Action:
>     row: 00015:0000001379572773802:00001
>     column: message:789
>     at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091460 from region aabc3dc0ab1c9c1ead1239b6be4234ed in table shangpin_alt_taoke_doc_queue
>   Action:
>     row: 00015:0000001379572773799:00001
>     column: message:789
>     at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091461 from region eb004b12a0e1d3f467cadb19a53ca29e in table link_rowkey
>   Action:
>     row: 191:1379572773794
>     column: link_crawler:15acca08d58e374208eb603aaa4c58e8_1379572773791_345445
>     at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091462 from region eb004b12a0e1d3f467cadb19a53ca29e in table link_rowkey
>   Action:
>     row: 191:1379572773793
>     column: link_crawler:f8a52947d7a0e794204f118fc7c5fb5a_1379572773791_916104
>     at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091463 from region 1af5b36fd3695e18ef85ea6d768a3e45 in table page_content_queue
>   Action:
>     row: 00039:0000001379572773776:00001
>     column: message:content
>     at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091464 from region 2514b6f61f05a2616f1e3dd893a56da6 in table offline_dashboard_queue
>   Action:
>     row: 00082:0000001379572773772:00001
>     column: message:parser
>     at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091465 from region 2514b6f61f05a2616f1e3dd893a56da6 in table offline_dashboard_queue
>   Action:
>     row: 00082:0000001379572773770:00001
>     column: message:parser
>     at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091466 from region 45ba2b931f94d4d453977ea16bccda6b in table taobao_auction_rowkey_queue
>   Action:
>     row: 00095:0000001379572773768:00001
>     column: message:taobao_auction
>     at time: Thu Sep 19 14:39:33 CST 2013
>   Action:
>     row: 00095:0000001379572773768:00002
>     column: message:taobao_auction
>     at time: Thu Sep 19 14:39:33 CST 2013
>   Action:
>     row: 00095:0000001379572773768:00003
>     column: message:taobao_auction
>     at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091467 from region 1af5b36fd3695e18ef85ea6d768a3e45 in table page_content_queue
>   Action:
>     row: 00039:0000001379572773766:00001
>     column: message:content
>     at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091468 from region 2514b6f61f05a2616f1e3dd893a56da6 in table offline_dashboard_queue
>   Action:
>     row: 00082:0000001379572773759:00001
>     column: message:parser
>     at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091469 from region aabc3dc0ab1c9c1ead1239b6be4234ed in table shangpin_alt_taoke_doc_queue
>   Action:
>     row: 00015:0000001379572773756:00001
>     column: message:789
>     at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091471 from region aabc3dc0ab1c9c1ead1239b6be4234ed in table shangpin_alt_taoke_doc_queue
>   Action:
>     row: 00015:0000001379572773755:00001
>     column: message:789
>     at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091472 from region 45ba2b931f94d4d453977ea16bccda6b in table taobao_auction_rowkey_queue
>   Action:
>     row: 00095:0000001379572773754:00001
>     column: message:taobao_auction
>     at time: Thu Sep 19 14:39:33 CST 2013
>   Action:
>     row: 00095:0000001379572773754:00002
>     column: message:taobao_auction
>     at time: Thu Sep 19 14:39:33 CST 2013
>   Action:
>     row: 00095:0000001379572773754:00003
>     column: message:taobao_auction
>     at time: Thu Sep 19 14:39:33 CST 2013
> Exception in thread "main" java.lang.IndexOutOfBoundsException: index (5) must be less than size (5)
> 	at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:301)
> 	at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:280)
> 	at org.apache.hadoop.hbase.regionserver.wal.LRUDictionary$BidirectionalLRUMap.get(LRUDictionary.java:122)
> 	at org.apache.hadoop.hbase.regionserver.wal.LRUDictionary$BidirectionalLRUMap.access$000(LRUDictionary.java:69)
> 	at org.apache.hadoop.hbase.regionserver.wal.LRUDictionary.getEntry(LRUDictionary.java:40)
> 	at org.apache.hadoop.hbase.regionserver.wal.Compressor.readCompressed(Compressor.java:148)
> 	at org.apache.hadoop.hbase.regionserver.wal.HLogKey.readFields(HLogKey.java:307)
> 	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2199)
> 	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2239)
> 	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:245)
> 	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:214)
> 	at org.apache.hadoop.hbase.regionserver.wal.HLogPrettyPrinter.processFile(HLogPrettyPrinter.java:242)
> 	at org.apache.hadoop.hbase.regionserver.wal.HLogPrettyPrinter.run(HLogPrettyPrinter.java:371)
> 	at org.apache.hadoop.hbase.regionserver.wal.HLog.main(HLog.java:1935)
> {noformat}
> The exception was the same, but in the hlog detail, we could see that the {color:red}sequence id '*80485091470*'{color} was missing. Actually, that could be the missing index '*5*' in the exception.
> So, I think there might be some bugs in HLog writing logic that made a 'gap' in the sequence writing phase. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)