You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Cosmin Lehene (JIRA)" <ji...@apache.org> on 2009/11/19 15:22:39 UTC

[jira] Created: (HBASE-1994) Master will lose hlog entries while splitting if region has empty oldlogfile.log

Master will lose hlog entries while splitting if region has empty oldlogfile.log
--------------------------------------------------------------------------------

                 Key: HBASE-1994
                 URL: https://issues.apache.org/jira/browse/HBASE-1994
             Project: Hadoop HBase
          Issue Type: Bug
          Components: master
    Affects Versions: 0.21.0
            Reporter: Cosmin Lehene
            Priority: Blocker
             Fix For: 0.21.0


I don't know yet how an empty oldlogfile.log can exist, however it happened.
Master will fail to put the splits in the region oldlogfile.log if an empty oldlogfile.log already exists there.
This is the master log after I artificially reproduced it by placing an empty oldlogfile.log in /hbase/.META./1028785192/oldlogfile.log and then killed the regionserver that was holding the .META. table

2009-11-19 09:08:36,012 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting 1 hlog(s) in hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773
2009-11-19 09:08:36,012 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting hlog 1 of 1: hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773/hlog.dat.1258637493128, length=0
2009-11-19 09:08:36,019 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Adding queue for .META.,,1
2009-11-19 09:08:36,037 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Pushed=795 entries from hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773/hlog.dat.1258637493128
2009-11-19 09:08:36,038 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Thread got 795 to process
2009-11-19 09:08:36,043 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Old hlog file hdfs://b0:9000/hbase/.META./1028785192/oldlogfile.log already exists. Copying existing file to new file
2009-11-19 09:08:36,079 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Got while writing region .META.,,1 log java.io.EOFException
2009-11-19 09:08:36,081 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: hlog file splitting completed in 70 millis for hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1994) Master will lose hlog entries while splitting if region has empty oldlogfile.log

Posted by "Lars George (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780025#action_12780025 ] 

Lars George commented on HBASE-1994:
------------------------------------

This could simply follow or use the normal file name conventions for hlogs:
{code}
...
        this.filenum = System.currentTimeMillis();
        Path newPath = computeFilename(this.filenum);
...

  public Path computeFilename(final long fn) {
    if (fn < 0) return null;
    return new Path(dir, HLOG_DATFILE + fn);
  }
{code}

Just with the known extension. Then in RS and Store classes change

{code}
  /** Name of old log file for reconstruction */
  static final String HREGION_OLDLOGFILE_NAME = "oldlogfile.log";
{code}

to be a extension only and not pass this on along the Store initialization but scan the FS for such file at the right spot in the code. Seems cleaner too, as it removes at least one parameter.

> Master will lose hlog entries while splitting if region has empty oldlogfile.log
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-1994
>                 URL: https://issues.apache.org/jira/browse/HBASE-1994
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.21.0
>            Reporter: Cosmin Lehene
>            Priority: Blocker
>             Fix For: 0.21.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> I don't know yet how an empty oldlogfile.log can exist, however it happened.
> Master will fail to put the splits in the region oldlogfile.log if an empty oldlogfile.log already exists there.
> This is the master log after I artificially reproduced it by placing an empty oldlogfile.log in /hbase/.META./1028785192/oldlogfile.log and then killed the regionserver that was holding the .META. table
> 2009-11-19 09:08:36,012 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting 1 hlog(s) in hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773
> 2009-11-19 09:08:36,012 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting hlog 1 of 1: hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773/hlog.dat.1258637493128, length=0
> 2009-11-19 09:08:36,019 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Adding queue for .META.,,1
> 2009-11-19 09:08:36,037 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Pushed=795 entries from hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773/hlog.dat.1258637493128
> 2009-11-19 09:08:36,038 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Thread got 795 to process
> 2009-11-19 09:08:36,043 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Old hlog file hdfs://b0:9000/hbase/.META./1028785192/oldlogfile.log already exists. Copying existing file to new file
> 2009-11-19 09:08:36,079 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Got while writing region .META.,,1 log java.io.EOFException
> 2009-11-19 09:08:36,081 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: hlog file splitting completed in 70 millis for hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1994) Master will lose hlog entries while splitting if region has empty oldlogfile.log

Posted by "Lars George (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781189#action_12781189 ] 

Lars George commented on HBASE-1994:
------------------------------------

I read the BigTable paper again and found that my "approach" is shunted there:

{quote}
One approach would be for each new tablet server to read this full commit log file and apply just the entries needed for the tablets it needs to recover. However, under such a scheme, if 100 machines were each assigned a single tablet from a failed tablet server, then the log file would be read 100 times (once by each server).
{quote}

Makes sense, if a RS hosted 100 regions (which is often a rather low figure in what is reported by users) those will be spread across the cluster and could result in N RS's reading the file trying to find what is where. So we could keep the threaded split we have or think about simply do the sort as suggest by the BigTable paper and then have simple range reads on each RS.

{quote}
We avoid duplicating log reads by first sorting the commit log entries in order of the keys (table, row name, log sequence number). In the sorted output, all mutations for a particular tablet are contiguous and can therefore be read efficiently with one disk seek followed by a sequential read. To parallelize the sorting, we partition the log file into 64 MB segments, and sort each segment in parallel on different tablet servers. This sorting process is coordinated by the master and is initiated when a tablet server indicates that it needs to recover mutations from some commit log file.{quote}

What did you discuss during the hackathon?

> Master will lose hlog entries while splitting if region has empty oldlogfile.log
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-1994
>                 URL: https://issues.apache.org/jira/browse/HBASE-1994
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.21.0
>            Reporter: Cosmin Lehene
>            Priority: Blocker
>             Fix For: 0.21.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> I don't know yet how an empty oldlogfile.log can exist, however it happened.
> Master will fail to put the splits in the region oldlogfile.log if an empty oldlogfile.log already exists there.
> This is the master log after I artificially reproduced it by placing an empty oldlogfile.log in /hbase/.META./1028785192/oldlogfile.log and then killed the regionserver that was holding the .META. table
> 2009-11-19 09:08:36,012 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting 1 hlog(s) in hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773
> 2009-11-19 09:08:36,012 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting hlog 1 of 1: hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773/hlog.dat.1258637493128, length=0
> 2009-11-19 09:08:36,019 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Adding queue for .META.,,1
> 2009-11-19 09:08:36,037 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Pushed=795 entries from hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773/hlog.dat.1258637493128
> 2009-11-19 09:08:36,038 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Thread got 795 to process
> 2009-11-19 09:08:36,043 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Old hlog file hdfs://b0:9000/hbase/.META./1028785192/oldlogfile.log already exists. Copying existing file to new file
> 2009-11-19 09:08:36,079 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Got while writing region .META.,,1 log java.io.EOFException
> 2009-11-19 09:08:36,081 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: hlog file splitting completed in 70 millis for hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1994) Master will lose hlog entries while splitting if region has empty oldlogfile.log

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780014#action_12780014 ] 

Andrew Purtell commented on HBASE-1994:
---------------------------------------

bq. Or use a UUID style filename for them with a known extension that the RS knows how to find and then replays them in order (so filename could be a timestamp?).

+1

bq. As for empty .log files we could add an easy check that if empty then remove as opposed to rename.

+1


> Master will lose hlog entries while splitting if region has empty oldlogfile.log
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-1994
>                 URL: https://issues.apache.org/jira/browse/HBASE-1994
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.21.0
>            Reporter: Cosmin Lehene
>            Priority: Blocker
>             Fix For: 0.21.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> I don't know yet how an empty oldlogfile.log can exist, however it happened.
> Master will fail to put the splits in the region oldlogfile.log if an empty oldlogfile.log already exists there.
> This is the master log after I artificially reproduced it by placing an empty oldlogfile.log in /hbase/.META./1028785192/oldlogfile.log and then killed the regionserver that was holding the .META. table
> 2009-11-19 09:08:36,012 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting 1 hlog(s) in hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773
> 2009-11-19 09:08:36,012 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting hlog 1 of 1: hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773/hlog.dat.1258637493128, length=0
> 2009-11-19 09:08:36,019 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Adding queue for .META.,,1
> 2009-11-19 09:08:36,037 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Pushed=795 entries from hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773/hlog.dat.1258637493128
> 2009-11-19 09:08:36,038 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Thread got 795 to process
> 2009-11-19 09:08:36,043 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Old hlog file hdfs://b0:9000/hbase/.META./1028785192/oldlogfile.log already exists. Copying existing file to new file
> 2009-11-19 09:08:36,079 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Got while writing region .META.,,1 log java.io.EOFException
> 2009-11-19 09:08:36,081 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: hlog file splitting completed in 70 millis for hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1994) Master will lose hlog entries while splitting if region has empty oldlogfile.log

Posted by "Lars George (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780010#action_12780010 ] 

Lars George commented on HBASE-1994:
------------------------------------

This is one of those events I never understood how they happen in the first place. I mean if we have a oldlogfile.log that was not yet replayed, why put another in place and not fail fast. Or use a UUID style filename for them with a known extension that the RS knows how to find and then replays them in order (so filename could be a timestamp?). Because if this can happen, it can only handle one such occurrence. What happens if that code is executed and there is already a oldlogfile.log.old in place? The code overwrites it right now - but is that what one want?

As for empty .log files we could add an easy check that if empty then remove as opposed to rename.

> Master will lose hlog entries while splitting if region has empty oldlogfile.log
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-1994
>                 URL: https://issues.apache.org/jira/browse/HBASE-1994
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.21.0
>            Reporter: Cosmin Lehene
>            Priority: Blocker
>             Fix For: 0.21.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> I don't know yet how an empty oldlogfile.log can exist, however it happened.
> Master will fail to put the splits in the region oldlogfile.log if an empty oldlogfile.log already exists there.
> This is the master log after I artificially reproduced it by placing an empty oldlogfile.log in /hbase/.META./1028785192/oldlogfile.log and then killed the regionserver that was holding the .META. table
> 2009-11-19 09:08:36,012 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting 1 hlog(s) in hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773
> 2009-11-19 09:08:36,012 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting hlog 1 of 1: hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773/hlog.dat.1258637493128, length=0
> 2009-11-19 09:08:36,019 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Adding queue for .META.,,1
> 2009-11-19 09:08:36,037 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Pushed=795 entries from hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773/hlog.dat.1258637493128
> 2009-11-19 09:08:36,038 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Thread got 795 to process
> 2009-11-19 09:08:36,043 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Old hlog file hdfs://b0:9000/hbase/.META./1028785192/oldlogfile.log already exists. Copying existing file to new file
> 2009-11-19 09:08:36,079 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Got while writing region .META.,,1 log java.io.EOFException
> 2009-11-19 09:08:36,081 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: hlog file splitting completed in 70 millis for hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1994) Master will lose hlog entries while splitting if region has empty oldlogfile.log

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780017#action_12780017 ] 

stack commented on HBASE-1994:
------------------------------

Agree (+1).  oldlogfile.log as name of a file of edits is dumbass.

> Master will lose hlog entries while splitting if region has empty oldlogfile.log
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-1994
>                 URL: https://issues.apache.org/jira/browse/HBASE-1994
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.21.0
>            Reporter: Cosmin Lehene
>            Priority: Blocker
>             Fix For: 0.21.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> I don't know yet how an empty oldlogfile.log can exist, however it happened.
> Master will fail to put the splits in the region oldlogfile.log if an empty oldlogfile.log already exists there.
> This is the master log after I artificially reproduced it by placing an empty oldlogfile.log in /hbase/.META./1028785192/oldlogfile.log and then killed the regionserver that was holding the .META. table
> 2009-11-19 09:08:36,012 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting 1 hlog(s) in hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773
> 2009-11-19 09:08:36,012 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting hlog 1 of 1: hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773/hlog.dat.1258637493128, length=0
> 2009-11-19 09:08:36,019 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Adding queue for .META.,,1
> 2009-11-19 09:08:36,037 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Pushed=795 entries from hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773/hlog.dat.1258637493128
> 2009-11-19 09:08:36,038 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Thread got 795 to process
> 2009-11-19 09:08:36,043 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Old hlog file hdfs://b0:9000/hbase/.META./1028785192/oldlogfile.log already exists. Copying existing file to new file
> 2009-11-19 09:08:36,079 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Got while writing region .META.,,1 log java.io.EOFException
> 2009-11-19 09:08:36,081 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: hlog file splitting completed in 70 millis for hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1994) Master will lose hlog entries while splitting if region has empty oldlogfile.log

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785502#action_12785502 ] 

stack commented on HBASE-1994:
------------------------------

+1 this patch overs this issue.

In hbase-1364, lets make sure we catch stuff Lars raises above.

> Master will lose hlog entries while splitting if region has empty oldlogfile.log
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-1994
>                 URL: https://issues.apache.org/jira/browse/HBASE-1994
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.21.0
>            Reporter: Cosmin Lehene
>            Assignee: Lars George
>            Priority: Blocker
>             Fix For: 0.21.0
>
>         Attachments: HBASE-1994-EOFE.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> I don't know yet how an empty oldlogfile.log can exist, however it happened.
> Master will fail to put the splits in the region oldlogfile.log if an empty oldlogfile.log already exists there.
> This is the master log after I artificially reproduced it by placing an empty oldlogfile.log in /hbase/.META./1028785192/oldlogfile.log and then killed the regionserver that was holding the .META. table
> 2009-11-19 09:08:36,012 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting 1 hlog(s) in hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773
> 2009-11-19 09:08:36,012 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting hlog 1 of 1: hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773/hlog.dat.1258637493128, length=0
> 2009-11-19 09:08:36,019 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Adding queue for .META.,,1
> 2009-11-19 09:08:36,037 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Pushed=795 entries from hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773/hlog.dat.1258637493128
> 2009-11-19 09:08:36,038 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Thread got 795 to process
> 2009-11-19 09:08:36,043 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Old hlog file hdfs://b0:9000/hbase/.META./1028785192/oldlogfile.log already exists. Copying existing file to new file
> 2009-11-19 09:08:36,079 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Got while writing region .META.,,1 log java.io.EOFException
> 2009-11-19 09:08:36,081 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: hlog file splitting completed in 70 millis for hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HBASE-1994) Master will lose hlog entries while splitting if region has empty oldlogfile.log

Posted by "Lars George (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars George reassigned HBASE-1994:
----------------------------------

    Assignee: Lars George

> Master will lose hlog entries while splitting if region has empty oldlogfile.log
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-1994
>                 URL: https://issues.apache.org/jira/browse/HBASE-1994
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.21.0
>            Reporter: Cosmin Lehene
>            Assignee: Lars George
>            Priority: Blocker
>             Fix For: 0.21.0
>
>         Attachments: HBASE-1994-EOFE.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> I don't know yet how an empty oldlogfile.log can exist, however it happened.
> Master will fail to put the splits in the region oldlogfile.log if an empty oldlogfile.log already exists there.
> This is the master log after I artificially reproduced it by placing an empty oldlogfile.log in /hbase/.META./1028785192/oldlogfile.log and then killed the regionserver that was holding the .META. table
> 2009-11-19 09:08:36,012 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting 1 hlog(s) in hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773
> 2009-11-19 09:08:36,012 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting hlog 1 of 1: hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773/hlog.dat.1258637493128, length=0
> 2009-11-19 09:08:36,019 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Adding queue for .META.,,1
> 2009-11-19 09:08:36,037 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Pushed=795 entries from hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773/hlog.dat.1258637493128
> 2009-11-19 09:08:36,038 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Thread got 795 to process
> 2009-11-19 09:08:36,043 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Old hlog file hdfs://b0:9000/hbase/.META./1028785192/oldlogfile.log already exists. Copying existing file to new file
> 2009-11-19 09:08:36,079 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Got while writing region .META.,,1 log java.io.EOFException
> 2009-11-19 09:08:36,081 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: hlog file splitting completed in 70 millis for hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1994) Master will lose hlog entries while splitting if region has empty oldlogfile.log

Posted by "Lars George (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780981#action_12780981 ] 

Lars George commented on HBASE-1994:
------------------------------------

I am really torn looking at HBASE-1364. It seems there is a lot of planning already to get splits to work better. Is this bug here simply a preliminary fix to avoid the empty log files and also not using the same file names? If so it is fine. But I would like to know what the overall plan is.

Before reading HBASE-1364 I had a similar thought how I would improve on my suggestion above. I thought we could do this:

- Master sees abandoned log and sets marker in ZK for regions contained in log
- RegionServer on start up or region open checks ZK marker and reads the original file extracting the HLogKey's it needs storing them into a local file in the region (this is the distributed log split, not all RS's split here but those that need to read the logs local anyways.) Also possible to also put the data into a MemStore at the same time.
- The RS applies the log and flushes, then sets a semaphore that it completed next to the original log.
- If we have a local copy done as well as per above then the RS can delete it now
- RS clears marker in ZK for its region(s)
- Once the master sees that all RS have read the log and processed it it can delete the original log

I am probably missing some intrinsic details as I have read the HLog etc. code but did not debug it or so to see if I got all the steps right. Let me know what you think and what you want me to do here.

> Master will lose hlog entries while splitting if region has empty oldlogfile.log
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-1994
>                 URL: https://issues.apache.org/jira/browse/HBASE-1994
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.21.0
>            Reporter: Cosmin Lehene
>            Priority: Blocker
>             Fix For: 0.21.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> I don't know yet how an empty oldlogfile.log can exist, however it happened.
> Master will fail to put the splits in the region oldlogfile.log if an empty oldlogfile.log already exists there.
> This is the master log after I artificially reproduced it by placing an empty oldlogfile.log in /hbase/.META./1028785192/oldlogfile.log and then killed the regionserver that was holding the .META. table
> 2009-11-19 09:08:36,012 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting 1 hlog(s) in hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773
> 2009-11-19 09:08:36,012 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting hlog 1 of 1: hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773/hlog.dat.1258637493128, length=0
> 2009-11-19 09:08:36,019 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Adding queue for .META.,,1
> 2009-11-19 09:08:36,037 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Pushed=795 entries from hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773/hlog.dat.1258637493128
> 2009-11-19 09:08:36,038 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Thread got 795 to process
> 2009-11-19 09:08:36,043 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Old hlog file hdfs://b0:9000/hbase/.META./1028785192/oldlogfile.log already exists. Copying existing file to new file
> 2009-11-19 09:08:36,079 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Got while writing region .META.,,1 log java.io.EOFException
> 2009-11-19 09:08:36,081 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: hlog file splitting completed in 70 millis for hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1994) Master will lose hlog entries while splitting if region has empty oldlogfile.log

Posted by "Lars George (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780128#action_12780128 ] 

Lars George commented on HBASE-1994:
------------------------------------

>From IRC

{code}
<larsgeorge> the write path is in the thread
<clehene> it's wap
<larsgeorge> yeah
<larsgeorge> urgh
<larsgeorge> it also only catches IOException
<larsgeorge> I only know from experience that uncaught exceptions in threads rarely get logged
<larsgeorge> and if then not proper as the stack is different
<clehene> uhm... however an exception in that big try could live it empty
<larsgeorge> yes
<larsgeorge> fragile
<clehene> and next time it would split 
<clehene> it would just throw away all edits because it fails with EOF
<larsgeorge> could be
<larsgeorge> the read deletes the input
<St^Ack> anything in the .out files?
<larsgeorge> so when the write fails
<larsgeorge> tough luck?
<larsgeorge> the read part has the delete in the finally
<larsgeorge> so the input log is deleted for sure
<St^Ack> cosmin you think the master went down while it was splitting a log?
<larsgeorge> shouldn't that be done at the very end? Or in some sort of Atomic commit
<larsgeorge> as in have a big try/catch/finally and either rollback the split or apply it
<St^Ack> I think that general split state needs to be hoisted up into zk
<St^Ack> master takes out a 'lock'
<St^Ack> one that will evaporate if it dies mid-split
<larsgeorge> hmm
<larsgeorge> for a master crash?
<larsgeorge> as this is all done in master start anyways
<St^Ack> yeah
<St^Ack> an empty oldlogfile.log -- i can't stand typing the name even -- would seem to an exit w/o a call to close
{code}

> Master will lose hlog entries while splitting if region has empty oldlogfile.log
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-1994
>                 URL: https://issues.apache.org/jira/browse/HBASE-1994
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.21.0
>            Reporter: Cosmin Lehene
>            Priority: Blocker
>             Fix For: 0.21.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> I don't know yet how an empty oldlogfile.log can exist, however it happened.
> Master will fail to put the splits in the region oldlogfile.log if an empty oldlogfile.log already exists there.
> This is the master log after I artificially reproduced it by placing an empty oldlogfile.log in /hbase/.META./1028785192/oldlogfile.log and then killed the regionserver that was holding the .META. table
> 2009-11-19 09:08:36,012 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting 1 hlog(s) in hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773
> 2009-11-19 09:08:36,012 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting hlog 1 of 1: hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773/hlog.dat.1258637493128, length=0
> 2009-11-19 09:08:36,019 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Adding queue for .META.,,1
> 2009-11-19 09:08:36,037 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Pushed=795 entries from hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773/hlog.dat.1258637493128
> 2009-11-19 09:08:36,038 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Thread got 795 to process
> 2009-11-19 09:08:36,043 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Old hlog file hdfs://b0:9000/hbase/.META./1028785192/oldlogfile.log already exists. Copying existing file to new file
> 2009-11-19 09:08:36,079 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Got while writing region .META.,,1 log java.io.EOFException
> 2009-11-19 09:08:36,081 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: hlog file splitting completed in 70 millis for hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1994) Master will lose hlog entries while splitting if region has empty oldlogfile.log

Posted by "Lars George (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780514#action_12780514 ] 

Lars George commented on HBASE-1994:
------------------------------------

Based on the above the plan could be to use ZK to flag a split in progress and then do the process more resilient

- Flag in ZK that split is running
- Read log but do not delete it yet
- Write logs into region directories with a temp extension and a sequential filename
- Once this all succeeded delete old logs and rename split logs to have non-temp extension
- Clear ZK flag

That way if something happens the original logs are still preserved until properly split. A split dying halfway through is also OK as the temp split files can safely be removed.

> Master will lose hlog entries while splitting if region has empty oldlogfile.log
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-1994
>                 URL: https://issues.apache.org/jira/browse/HBASE-1994
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.21.0
>            Reporter: Cosmin Lehene
>            Priority: Blocker
>             Fix For: 0.21.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> I don't know yet how an empty oldlogfile.log can exist, however it happened.
> Master will fail to put the splits in the region oldlogfile.log if an empty oldlogfile.log already exists there.
> This is the master log after I artificially reproduced it by placing an empty oldlogfile.log in /hbase/.META./1028785192/oldlogfile.log and then killed the regionserver that was holding the .META. table
> 2009-11-19 09:08:36,012 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting 1 hlog(s) in hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773
> 2009-11-19 09:08:36,012 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting hlog 1 of 1: hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773/hlog.dat.1258637493128, length=0
> 2009-11-19 09:08:36,019 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Adding queue for .META.,,1
> 2009-11-19 09:08:36,037 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Pushed=795 entries from hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773/hlog.dat.1258637493128
> 2009-11-19 09:08:36,038 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Thread got 795 to process
> 2009-11-19 09:08:36,043 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Old hlog file hdfs://b0:9000/hbase/.META./1028785192/oldlogfile.log already exists. Copying existing file to new file
> 2009-11-19 09:08:36,079 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Got while writing region .META.,,1 log java.io.EOFException
> 2009-11-19 09:08:36,081 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: hlog file splitting completed in 70 millis for hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1994) Master will lose hlog entries while splitting if region has empty oldlogfile.log

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785452#action_12785452 ] 

Jean-Daniel Cryans commented on HBASE-1994:
-------------------------------------------

Good stuff, this effectively covers the scope of this jira. +1

> Master will lose hlog entries while splitting if region has empty oldlogfile.log
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-1994
>                 URL: https://issues.apache.org/jira/browse/HBASE-1994
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.21.0
>            Reporter: Cosmin Lehene
>            Assignee: Lars George
>            Priority: Blocker
>             Fix For: 0.21.0
>
>         Attachments: HBASE-1994-EOFE.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> I don't know yet how an empty oldlogfile.log can exist, however it happened.
> Master will fail to put the splits in the region oldlogfile.log if an empty oldlogfile.log already exists there.
> This is the master log after I artificially reproduced it by placing an empty oldlogfile.log in /hbase/.META./1028785192/oldlogfile.log and then killed the regionserver that was holding the .META. table
> 2009-11-19 09:08:36,012 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting 1 hlog(s) in hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773
> 2009-11-19 09:08:36,012 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting hlog 1 of 1: hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773/hlog.dat.1258637493128, length=0
> 2009-11-19 09:08:36,019 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Adding queue for .META.,,1
> 2009-11-19 09:08:36,037 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Pushed=795 entries from hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773/hlog.dat.1258637493128
> 2009-11-19 09:08:36,038 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Thread got 795 to process
> 2009-11-19 09:08:36,043 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Old hlog file hdfs://b0:9000/hbase/.META./1028785192/oldlogfile.log already exists. Copying existing file to new file
> 2009-11-19 09:08:36,079 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Got while writing region .META.,,1 log java.io.EOFException
> 2009-11-19 09:08:36,081 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: hlog file splitting completed in 70 millis for hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HBASE-1994) Master will lose hlog entries while splitting if region has empty oldlogfile.log

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack resolved HBASE-1994.
--------------------------

       Resolution: Fixed
    Fix Version/s: 0.20.3

Applied branch and trunk.  Thanks for the patch Lars.

> Master will lose hlog entries while splitting if region has empty oldlogfile.log
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-1994
>                 URL: https://issues.apache.org/jira/browse/HBASE-1994
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.21.0
>            Reporter: Cosmin Lehene
>            Assignee: Lars George
>            Priority: Blocker
>             Fix For: 0.20.3, 0.21.0
>
>         Attachments: HBASE-1994-EOFE.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> I don't know yet how an empty oldlogfile.log can exist, however it happened.
> Master will fail to put the splits in the region oldlogfile.log if an empty oldlogfile.log already exists there.
> This is the master log after I artificially reproduced it by placing an empty oldlogfile.log in /hbase/.META./1028785192/oldlogfile.log and then killed the regionserver that was holding the .META. table
> 2009-11-19 09:08:36,012 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting 1 hlog(s) in hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773
> 2009-11-19 09:08:36,012 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting hlog 1 of 1: hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773/hlog.dat.1258637493128, length=0
> 2009-11-19 09:08:36,019 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Adding queue for .META.,,1
> 2009-11-19 09:08:36,037 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Pushed=795 entries from hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773/hlog.dat.1258637493128
> 2009-11-19 09:08:36,038 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Thread got 795 to process
> 2009-11-19 09:08:36,043 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Old hlog file hdfs://b0:9000/hbase/.META./1028785192/oldlogfile.log already exists. Copying existing file to new file
> 2009-11-19 09:08:36,079 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Got while writing region .META.,,1 log java.io.EOFException
> 2009-11-19 09:08:36,081 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: hlog file splitting completed in 70 millis for hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1994) Master will lose hlog entries while splitting if region has empty oldlogfile.log

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780741#action_12780741 ] 

Jean-Daniel Cryans commented on HBASE-1994:
-------------------------------------------

+1 

This is a good introductory work to distributed splitting. 

> Master will lose hlog entries while splitting if region has empty oldlogfile.log
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-1994
>                 URL: https://issues.apache.org/jira/browse/HBASE-1994
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.21.0
>            Reporter: Cosmin Lehene
>            Priority: Blocker
>             Fix For: 0.21.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> I don't know yet how an empty oldlogfile.log can exist, however it happened.
> Master will fail to put the splits in the region oldlogfile.log if an empty oldlogfile.log already exists there.
> This is the master log after I artificially reproduced it by placing an empty oldlogfile.log in /hbase/.META./1028785192/oldlogfile.log and then killed the regionserver that was holding the .META. table
> 2009-11-19 09:08:36,012 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting 1 hlog(s) in hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773
> 2009-11-19 09:08:36,012 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting hlog 1 of 1: hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773/hlog.dat.1258637493128, length=0
> 2009-11-19 09:08:36,019 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Adding queue for .META.,,1
> 2009-11-19 09:08:36,037 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Pushed=795 entries from hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773/hlog.dat.1258637493128
> 2009-11-19 09:08:36,038 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Thread got 795 to process
> 2009-11-19 09:08:36,043 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Old hlog file hdfs://b0:9000/hbase/.META./1028785192/oldlogfile.log already exists. Copying existing file to new file
> 2009-11-19 09:08:36,079 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Got while writing region .META.,,1 log java.io.EOFException
> 2009-11-19 09:08:36,081 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: hlog file splitting completed in 70 millis for hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1994) Master will lose hlog entries while splitting if region has empty oldlogfile.log

Posted by "Lars George (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars George updated HBASE-1994:
-------------------------------

    Attachment: HBASE-1994-EOFE.patch

HBASE-1994-EOFE.patch simply handles the empty oldlogfile.log case where an exception is thrown when attempting to rename the zero byte length file.

> Master will lose hlog entries while splitting if region has empty oldlogfile.log
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-1994
>                 URL: https://issues.apache.org/jira/browse/HBASE-1994
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.21.0
>            Reporter: Cosmin Lehene
>            Priority: Blocker
>             Fix For: 0.21.0
>
>         Attachments: HBASE-1994-EOFE.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> I don't know yet how an empty oldlogfile.log can exist, however it happened.
> Master will fail to put the splits in the region oldlogfile.log if an empty oldlogfile.log already exists there.
> This is the master log after I artificially reproduced it by placing an empty oldlogfile.log in /hbase/.META./1028785192/oldlogfile.log and then killed the regionserver that was holding the .META. table
> 2009-11-19 09:08:36,012 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting 1 hlog(s) in hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773
> 2009-11-19 09:08:36,012 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting hlog 1 of 1: hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773/hlog.dat.1258637493128, length=0
> 2009-11-19 09:08:36,019 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Adding queue for .META.,,1
> 2009-11-19 09:08:36,037 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Pushed=795 entries from hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773/hlog.dat.1258637493128
> 2009-11-19 09:08:36,038 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Thread got 795 to process
> 2009-11-19 09:08:36,043 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Old hlog file hdfs://b0:9000/hbase/.META./1028785192/oldlogfile.log already exists. Copying existing file to new file
> 2009-11-19 09:08:36,079 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Got while writing region .META.,,1 log java.io.EOFException
> 2009-11-19 09:08:36,081 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: hlog file splitting completed in 70 millis for hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1994) Master will lose hlog entries while splitting if region has empty oldlogfile.log

Posted by "Lars George (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785252#action_12785252 ] 

Lars George commented on HBASE-1994:
------------------------------------

I would say the patch is enough to complete this issue. The discussed items above should carried over to HBASE-1364 or a new issue should be opened instead. Makes sense?

> Master will lose hlog entries while splitting if region has empty oldlogfile.log
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-1994
>                 URL: https://issues.apache.org/jira/browse/HBASE-1994
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.21.0
>            Reporter: Cosmin Lehene
>            Priority: Blocker
>             Fix For: 0.21.0
>
>         Attachments: HBASE-1994-EOFE.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> I don't know yet how an empty oldlogfile.log can exist, however it happened.
> Master will fail to put the splits in the region oldlogfile.log if an empty oldlogfile.log already exists there.
> This is the master log after I artificially reproduced it by placing an empty oldlogfile.log in /hbase/.META./1028785192/oldlogfile.log and then killed the regionserver that was holding the .META. table
> 2009-11-19 09:08:36,012 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting 1 hlog(s) in hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773
> 2009-11-19 09:08:36,012 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting hlog 1 of 1: hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773/hlog.dat.1258637493128, length=0
> 2009-11-19 09:08:36,019 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Adding queue for .META.,,1
> 2009-11-19 09:08:36,037 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Pushed=795 entries from hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773/hlog.dat.1258637493128
> 2009-11-19 09:08:36,038 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Thread got 795 to process
> 2009-11-19 09:08:36,043 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Old hlog file hdfs://b0:9000/hbase/.META./1028785192/oldlogfile.log already exists. Copying existing file to new file
> 2009-11-19 09:08:36,079 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Got while writing region .META.,,1 log java.io.EOFException
> 2009-11-19 09:08:36,081 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: hlog file splitting completed in 70 millis for hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1994) Master will lose hlog entries while splitting if region has empty oldlogfile.log

Posted by "Cosmin Lehene (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785956#action_12785956 ] 

Cosmin Lehene commented on HBASE-1994:
--------------------------------------

+1 

I just tested it and got 

{code:title=hbase-master.log}
2009-12-04 09:51:28,731 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting 1 hlog(s) in hdfs://b0:9000/hbase/.logs/b3,60020,1259928238561
2009-12-04 09:51:28,770 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Old hlog file hdfs://b0:9000/hbase/.META./1028785192/oldlogfile.log is zero length. Deleting existing file
2009-12-04 09:51:28,875 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: hlog file splitting completed in 145 millis for hdfs://b0:9000/hbase/.logs/b3,60020,1259928238561  {code}

No missing edits.

> Master will lose hlog entries while splitting if region has empty oldlogfile.log
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-1994
>                 URL: https://issues.apache.org/jira/browse/HBASE-1994
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.21.0
>            Reporter: Cosmin Lehene
>            Assignee: Lars George
>            Priority: Blocker
>             Fix For: 0.21.0
>
>         Attachments: HBASE-1994-EOFE.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> I don't know yet how an empty oldlogfile.log can exist, however it happened.
> Master will fail to put the splits in the region oldlogfile.log if an empty oldlogfile.log already exists there.
> This is the master log after I artificially reproduced it by placing an empty oldlogfile.log in /hbase/.META./1028785192/oldlogfile.log and then killed the regionserver that was holding the .META. table
> 2009-11-19 09:08:36,012 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting 1 hlog(s) in hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773
> 2009-11-19 09:08:36,012 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting hlog 1 of 1: hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773/hlog.dat.1258637493128, length=0
> 2009-11-19 09:08:36,019 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Adding queue for .META.,,1
> 2009-11-19 09:08:36,037 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Pushed=795 entries from hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773/hlog.dat.1258637493128
> 2009-11-19 09:08:36,038 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Thread got 795 to process
> 2009-11-19 09:08:36,043 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Old hlog file hdfs://b0:9000/hbase/.META./1028785192/oldlogfile.log already exists. Copying existing file to new file
> 2009-11-19 09:08:36,079 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Got while writing region .META.,,1 log java.io.EOFException
> 2009-11-19 09:08:36,081 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: hlog file splitting completed in 70 millis for hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.