You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "shihuafeng (Jira)" <ji...@apache.org> on 2021/11/04 02:01:00 UTC
[jira] [Comment Edited] (HBASE-26209) edit file loss result in data loss when restart hbase cluster

    [ https://issues.apache.org/jira/browse/HBASE-26209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438440#comment-17438440 ] 

shihuafeng edited comment on HBASE-26209 at 11/4/21, 2:00 AM:
--------------------------------------------------------------

i  reproduce the  scenario  when i  restart hbase cluster (Tue Nov  2 11:56:36 CST 2021 重启hbase)

i find the edit files which  could not be replay  is on hdfs , but they are not been  read  by following method 

 *NavigableSet<Path> files = WALSplitter.getSplitEditFilesSorted(fs, regiondir);*
 # *why did  edit files  not been read ?*

          region  assing  happened   in the process of  split log ,so the part of edits files was read .

          *open region* *log*

hbase-cmf-hbase-REGIONSERVER-gy14.esgync.local.log.out.1{color:#ff0000}:2021-11-02 12:01:06,05{color}5 INFO org.apache.hadoop.hbase.regionserver.RSRpcServices: Open TRAFODION.JAVABENCH3.OE_STOCK_INDEX_300,\x00\x00\x00\x02X\xF1z\x0A\x1F\x01\x00\x00\x00\x00\x00\x00\x00\x00,1635500260495.617e7194714d0952104a3626935495db.

         *split log*

         {color:#ff0000}2021-11-02 12:01:00,578{color} INFO org.apache.hadoop.hbase.master.SplitLogManager: Started splitting 38 logs in [hdfs://nameservice1/hbase/WALs/gy25.esgync.local,60020,1635822374932-splitting] for [gy25.esgync.local,60020,1635822374932]

         *some eidt file  is  splitting* 

       {color:#ff0000} 2021-11-02 12:01:10,738{color} INFO org.apache.hadoop.hbase.wal.WALSplitter: Rename hdfs://nameservice1/hbase/data/default/TRAFODION.JAVABENCH3.OE_STOCK_INDEX_300/617e7194714d0952104a3626935495db/recovered.edits/0000000000015560856.temp to hdfs://nameservice1/hbase/data/default/TRAFODION.JAVABENCH3.OE_STOCK_INDEX_300/617e7194714d0952104a3626935495db/recovered.edits/0000000000015575618

     

 

 


was (Author: shihuafeng):
i  reproduce the  scenario  when i  restart hbase cluster 

i find the edit files which  could not be replay  is on hdfs , but they are not been  read  by following method 

 *NavigableSet<Path> files = WALSplitter.getSplitEditFilesSorted(fs, regiondir);*
 # *why did  edit files  not been read ?*

          region  assing  happened   in the process of  split log ,so the part of edits files was read .

          *open region* *log*

hbase-cmf-hbase-REGIONSERVER-gy14.esgync.local.log.out.1{color:#FF0000}:2021-11-02 12:01:06,05{color}5 INFO org.apache.hadoop.hbase.regionserver.RSRpcServices: Open TRAFODION.JAVABENCH3.OE_STOCK_INDEX_300,\x00\x00\x00\x02X\xF1z\x0A\x1F\x01\x00\x00\x00\x00\x00\x00\x00\x00,1635500260495.617e7194714d0952104a3626935495db.

         *split log*

         {color:#FF0000}2021-11-02 12:01:00,578{color} INFO org.apache.hadoop.hbase.master.SplitLogManager: Started splitting 38 logs in [hdfs://nameservice1/hbase/WALs/gy25.esgync.local,60020,1635822374932-splitting] for [gy25.esgync.local,60020,1635822374932]

         *some eidt file  is  splitting* 

       {color:#FF0000} 2021-11-02 12:01:10,738{color} INFO org.apache.hadoop.hbase.wal.WALSplitter: Rename hdfs://nameservice1/hbase/data/default/TRAFODION.JAVABENCH3.OE_STOCK_INDEX_300/617e7194714d0952104a3626935495db/recovered.edits/0000000000015560856.temp to hdfs://nameservice1/hbase/data/default/TRAFODION.JAVABENCH3.OE_STOCK_INDEX_300/617e7194714d0952104a3626935495db/recovered.edits/0000000000015575618

     

 

 

> edit file loss result in  data loss  when restart hbase cluster
> ---------------------------------------------------------------
>
>                 Key: HBASE-26209
>                 URL: https://issues.apache.org/jira/browse/HBASE-26209
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>         Environment: Linux version 3.10.0-693.el7.x86_64 (mockbuild@x86-038.build.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Thu Jul 6 19:56:57 EDT 2017
>            Reporter: shihuafeng
>            Priority: Blocker
>         Attachments: Repaly_edit.log, rename_edit.log
>
>
> {{      when i  restart  hbase cluster,i find  edit file loss when wal  repaly .}}
>          the number of edit files (00000000seqid.tmp to   00000000seqid )  is 31 when split wal to  edit.  But when i read edits to repaly ,i foud the  sum is 30.
>          i see rename file is sucessful,but i can not find the edit file .
> {panel:title=/var/log/message     i find  system exception}
>   ACPI Error: SMBus/IPMI/GenericSerialBus write requires Buffer of length 66, found length 32 (20130517/exfield-389)
>  Aug 16 03:30:41 esgsh6 kernel: ACPI Error: Method parse/execution failed [\_SB_.PMI0._PMM] (Node ffff8810e9eab258), AE_AML_BUFFER_LIMIT (20130517/psparse-536)
> {panel}
>  #  *rename (00000000seqid.tmp to   00000000seqid is 31* 
>         
>          i can not find the follwing file , i confirm the edit file (0*000000000001825010*)is not empty.
>   
> {panel:title=log}
> hbase-cmf-hbase-REGIONSERVER-gy11.esgync.local.log.out:2021-08-16 17:56:28,956 INFO org.apache.hadoop.hbase.wal.WALSplitter: Rename hdfs://nameservice1/hbase/data/default/TRAFODION.JAVABENCH2.OE_STOCK_INDEX_300/8a42de414d97b457da88bc4682dd7c52/recovered.edits/0000000000001810650.temp to hdfs://nameservice1/hbase/data/default/TRAFODION.JAVABENCH2.OE_STOCK_INDEX_300/8a42de414d97b457da88bc4682dd7c52/recovered.edits/0*000000000001825010*
> {panel}
> **
>          you can see attachment{color:#999999} {color} *{color:#3366ff}r{color}*{color:#3366ff}*ename_edit.log*{color}
>      *2.  at replay phase , Reading the edits is 30* 
>   
> {panel:title=log}
>       hbase-cmf-hbase-REGIONSERVER-gy11.esgync.local.log.out:2021-08-16 17:56:14,938      INFO org.apache.hadoop.hbase.regionserver.HRegion: after replayRecoveredEdits Maximum sequenceid  1914955 and minimum sequenceid for the region is 1916711, replay the file, path=hdfs://nameservice1/hbase/data/default/TRAFODION.JAVABENCH2.OE_STOCK_INDEX_300/8a42de414d97b457da88bc4682dd7c52/recovered.edits/0000000000001914955,seqid=1916711,*size=30*
> {panel}
> **
>   
>   
> {code:java}
> org.apache.hadoop.hbase.regionserver.HRegion
>  NavigableSet<Path> files = WALSplitter.getSplitEditFilesSorted(fs, regiondir);
> if (LOG.isDebugEnabled()) {
>  LOG.debug("Found " + (files == null ? 0 : files.size())
>  + " recovered edits file(s) under " + regiondir);
> }
> if (files == null || files.isEmpty()) return seqid;
> long start=System.currentTimeMillis();
> for (Path edits: files) {
>  if (edits == null || !fs.exists(edits)) {
>  LOG.warn("Null or non-existent edits file: " + edits);
>  continue;
>  }
>  if (isZeroLengthThenDelete(fs, edits)) continue;
>  long maxSeqId;
>  String fileName = edits.getName();
>  maxSeqId = Math.abs(Long.parseLong(fileName));
>  if (maxSeqId <= minSeqIdForTheRegion) {
>  if (LOG.isDebugEnabled()) {
>  String msg = "Maximum sequenceid for this wal is " + maxSeqId
>  + " and minimum sequenceid for the region is " + minSeqIdForTheRegion
>  + ", skipped the whole file, path=" + edits;
>  LOG.info(msg);
>  }
>  continue;
>  }
>  try {
>  seqid = Math.max(seqid, replayRecoveredEdits(edits, maxSeqIdInStores, reporter));
>  // replay the edits. Replay can return -1 if everything is skipped, only update
>  // if seqId is greater
>  String msg = "after replayRecoveredEdits Maximum sequenceid " + maxSeqId
>  + " and minimum sequenceid for the region is " + minSeqIdForTheRegion
>  + ", replay the file, path=" + edits
>  +",seqid="+seqid+",size="+files.size();
>  LOG.info(msg);{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)