You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Mingjian Deng <ko...@gmail.com> on 2011/10/18 16:40:51 UTC

data loss when splitLog()

Hi:
    There is a case cause data loss in our cluster. We block in splitLog
because some error in our hdfs and we kill master. Some hlog files were
moved from .logs to .oldlogs before them were wrote to .recovered.edits. So
rs couldn't replay these files.
    In HLogSplitter.java, we found:
    ...
    archiveLogs(srcDir, corruptedLogs, processedLogs, oldLogDir, fs, conf);
    } finally {
      LOG.info("Finishing writing output logs and closing down.");
      splits = outputSink.finishWritingAndClose();
    }
    Why archiveLogs before outputSink.finishWritingAndClose()? Did these
hlog files mv to .oldlogs and couldn't be split next startup if write
threads failed but archiveLog success?

Re: data loss when splitLog()

Posted by Jean-Daniel Cryans <jd...@apache.org>.

Mmm ok, how did you kill the master exactly? kill -9 or a normal shutdown? I
think I could see how it would happen in the case of a normal shutdown, but
even then it would *really really* help to see the logs of what's going on.

J-D

On Tue, Oct 18, 2011 at 6:37 PM, Mingjian Deng <ko...@gmail.com> wrote:

> @J-D: I used cloudrea CDH3. This loss can't replay every time but it could
> happen with the following logs:
> "2011-10-19 04:44:09,065 DEBUG
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Used 134218288 bytes
> of buffered edits, waiting for IO threads..."
> This log printed many times and even 134218288 didn't change. I kill master
> and restarted, the data loss. So I think the 134218288 bytes of entry was
> the last entry in memory. In the following codes:
> " synchronized (dataAvailable) {
>        totalBuffered += incrHeap;
>        while (totalBuffered > maxHeapUsage && (thrown == null ||
> thrown.get()== null)){
>          LOG.debug("Used " + totalBuffered + " bytes of buffered edits,
> waiting for IO threads...");
>          dataAvailable.wait(3000);
>        }
>        dataAvailable.notifyAll();
>      }"
> If (totalBuffered <= maxHeapUsage) and there was no more entry in .logs
> dir, archiveLogs would excute even before writeThread end.
>
> 2011/10/19 Jean-Daniel Cryans <jd...@apache.org>
>
> > Even if the files aren't closed properly, the fact that you are appending
> > should persist them.
> >
> > Are you using a version of Hadoop that supports sync?
> >
> > Do you have logs that show the issue where the logs were moved but not
> > written?
> >
> > Thx,
> >
> > J-D
> >
> > On Tue, Oct 18, 2011 at 7:40 AM, Mingjian Deng <ko...@gmail.com>
> > wrote:
> >
> > > Hi:
> > >    There is a case cause data loss in our cluster. We block in splitLog
> > > because some error in our hdfs and we kill master. Some hlog files were
> > > moved from .logs to .oldlogs before them were wrote to
> .recovered.edits.
> > So
> > > rs couldn't replay these files.
> > >    In HLogSplitter.java, we found:
> > >    ...
> > >    archiveLogs(srcDir, corruptedLogs, processedLogs, oldLogDir, fs,
> > conf);
> > >    } finally {
> > >      LOG.info("Finishing writing output logs and closing down.");
> > >      splits = outputSink.finishWritingAndClose();
> > >    }
> > >    Why archiveLogs before outputSink.finishWritingAndClose()? Did these
> > > hlog files mv to .oldlogs and couldn't be split next startup if write
> > > threads failed but archiveLog success?
> > >
> >
>

Re: data loss when splitLog()

Posted by Mingjian Deng <ko...@gmail.com>.

@J-D: I used cloudrea CDH3. This loss can't replay every time but it could
happen with the following logs:
"2011-10-19 04:44:09,065 DEBUG
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Used 134218288 bytes
of buffered edits, waiting for IO threads..."
This log printed many times and even 134218288 didn't change. I kill master
and restarted, the data loss. So I think the 134218288 bytes of entry was
the last entry in memory. In the following codes:
" synchronized (dataAvailable) {
        totalBuffered += incrHeap;
        while (totalBuffered > maxHeapUsage && (thrown == null ||
thrown.get()== null)){
          LOG.debug("Used " + totalBuffered + " bytes of buffered edits,
waiting for IO threads...");
          dataAvailable.wait(3000);
        }
        dataAvailable.notifyAll();
      }"
If (totalBuffered <= maxHeapUsage) and there was no more entry in .logs
dir, archiveLogs would excute even before writeThread end.

2011/10/19 Jean-Daniel Cryans <jd...@apache.org>

> Even if the files aren't closed properly, the fact that you are appending
> should persist them.
>
> Are you using a version of Hadoop that supports sync?
>
> Do you have logs that show the issue where the logs were moved but not
> written?
>
> Thx,
>
> J-D
>
> On Tue, Oct 18, 2011 at 7:40 AM, Mingjian Deng <ko...@gmail.com>
> wrote:
>
> > Hi:
> >    There is a case cause data loss in our cluster. We block in splitLog
> > because some error in our hdfs and we kill master. Some hlog files were
> > moved from .logs to .oldlogs before them were wrote to .recovered.edits.
> So
> > rs couldn't replay these files.
> >    In HLogSplitter.java, we found:
> >    ...
> >    archiveLogs(srcDir, corruptedLogs, processedLogs, oldLogDir, fs,
> conf);
> >    } finally {
> >      LOG.info("Finishing writing output logs and closing down.");
> >      splits = outputSink.finishWritingAndClose();
> >    }
> >    Why archiveLogs before outputSink.finishWritingAndClose()? Did these
> > hlog files mv to .oldlogs and couldn't be split next startup if write
> > threads failed but archiveLog success?
> >
>

Re: data loss when splitLog()

Posted by Jean-Daniel Cryans <jd...@apache.org>.

Even if the files aren't closed properly, the fact that you are appending
should persist them.

Are you using a version of Hadoop that supports sync?

Do you have logs that show the issue where the logs were moved but not
written?

Thx,

J-D

On Tue, Oct 18, 2011 at 7:40 AM, Mingjian Deng <ko...@gmail.com> wrote:

> Hi:
>    There is a case cause data loss in our cluster. We block in splitLog
> because some error in our hdfs and we kill master. Some hlog files were
> moved from .logs to .oldlogs before them were wrote to .recovered.edits. So
> rs couldn't replay these files.
>    In HLogSplitter.java, we found:
>    ...
>    archiveLogs(srcDir, corruptedLogs, processedLogs, oldLogDir, fs, conf);
>    } finally {
>      LOG.info("Finishing writing output logs and closing down.");
>      splits = outputSink.finishWritingAndClose();
>    }
>    Why archiveLogs before outputSink.finishWritingAndClose()? Did these
> hlog files mv to .oldlogs and couldn't be split next startup if write
> threads failed but archiveLog success?
>