You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Todd Lipcon <to...@cloudera.com> on 2010/07/28 19:17:13 UTC

Re: Review Request: HBASE-2312: Rename HLog Dir when Splitting

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/396/#review513
-----------------------------------------------------------


This looks good, except that it requires the HDFS patches to be in place to even compile. Although I think it's OK to not fix the bug in the case that we're on stock HDFS, I think we still need to be able to run, buggily.

Did we determine that all of the other solutions were flawed/too complicated?


trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
<http://review.cloudera.org/r/396/#comment2061>

    Check return value of rename (this is one of those stupid APIs that returns false instead of throwing)



trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
<http://review.cloudera.org/r/396/#comment2062>

    rather than reassigning, just pass splitDir below?



trunk/src/main/java/org/apache/hadoop/hbase/master/ProcessServerShutdown.java
<http://review.cloudera.org/r/396/#comment2064>

    extract "-splitting" out to a constant?
    Or can we move some of this common code into HLog?



trunk/src/main/java/org/apache/hadoop/hbase/master/ProcessServerShutdown.java
<http://review.cloudera.org/r/396/#comment2063>

    maybe:
    assert !(logDirExists && splitDirExists) : "Both " + rsLogDir + " and " + rsSplitDir + " exist";
    so if the assertion failure happens it's more understandable


- Todd


On 2010-07-27 23:10:02, Nicolas wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.cloudera.org/r/396/
> -----------------------------------------------------------
> 
> (Updated 2010-07-27 23:10:02)
> 
> 
> Review request for hbase.
> 
> 
> Summary
> -------
> 
> There is a very corner case when bad things could happen(ie data loss):
> 
> 1) RS #1 is going to roll its HLog - not yet created the new one, old one will get no more writes
> 2) RS #1 enters GC Pause of Death
> 3) Master lists HLog files of RS#1 that is has to split as RS#1 is dead, starts splitting
> 4) RS #1 wakes up, created the new HLog (previous one was rolled) and appends an edit - which is lost
> 
> Note that this fix requires a healthy dose of HDFS prerequisites: HDFS-617, HADOOP-6840, HADOOP-6886.  I encourage you to review those as well, give feedback, and hopefully give +1s so we can push the changes through.
> 
> 
> This addresses bug HBASE-2312.
>     http://issues.apache.org/jira/browse/HBASE-2312
> 
> 
> Diffs
> -----
> 
>   trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java 979953 
>   trunk/src/main/java/org/apache/hadoop/hbase/master/ProcessServerShutdown.java 979953 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 979953 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 979953 
>   trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java 979953 
> 
> Diff: http://review.cloudera.org/r/396/diff
> 
> 
> Testing
> -------
> 
> mvn test;
> bin/start-hbase.sh
> bin/hbase shell < scan '.META.', get, put, etc
> 
> 
> Thanks,
> 
> Nicolas
> 
>


Re: Review Request: HBASE-2312: Rename HLog Dir when Splitting

Posted by st...@duboce.net.

> On 2010-07-28 10:17:14, Todd Lipcon wrote:
> > This looks good, except that it requires the HDFS patches to be in place to even compile. Although I think it's OK to not fix the bug in the case that we're on stock HDFS, I think we still need to be able to run, buggily.
> > 
> > Did we determine that all of the other solutions were flawed/too complicated?
> 
> Nicolas wrote:
>     HBASE-2312 had some lengthy discussions that ultimately led to this route.  For 0.21 + 0.22, only HADOOP-6840 is needed.  That is a very minor change consisting of only small API additions (so, no regression needed) that Dhruba & I think the lack of risk will make it easy to approve.  0.20-append needs a couple more JIRAs, but that should be even less flack.  As long as we ship with 0.20-append HDFS or newer, we'll be fine.
>     
>     I understand that this diff is early and we will have to wait until we ship with the 0.20-append JAR before application.  Basically, I also wanted to show that the API change indeed fixes our issue painlessly.  Any +1s on HADOOP-6840 would be appreciated :)  Your comment is that we should also backward-support 0.20.3?
> 
> Todd Lipcon wrote:
>     Yea, I think we decided at one point that we should be able to run against a vanilla apache cluster, just that it would be "at your own risk" - ie that the bug fixes wouldn't necessarily work. EG this is why we do the reflection to check for the syncFs() method and warn in the case when it's not there, but continue to function.
>     
>     In this patch, it would actually fail to work at all, since the RPC for non-recursive create would get an error at the NN.

We'll ship with a branch-0.20-append but yeah, current thinking is that we should be able to run on an hadoop 0.20.x that does not have a working sync.

We've been known to change our minds.  Start a discussion out on dev list if want to argue hbase 0.90.x requires a working sync.


- stack


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/396/#review513
-----------------------------------------------------------


On 2010-07-27 23:10:02, Nicolas wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.cloudera.org/r/396/
> -----------------------------------------------------------
> 
> (Updated 2010-07-27 23:10:02)
> 
> 
> Review request for hbase.
> 
> 
> Summary
> -------
> 
> There is a very corner case when bad things could happen(ie data loss):
> 
> 1) RS #1 is going to roll its HLog - not yet created the new one, old one will get no more writes
> 2) RS #1 enters GC Pause of Death
> 3) Master lists HLog files of RS#1 that is has to split as RS#1 is dead, starts splitting
> 4) RS #1 wakes up, created the new HLog (previous one was rolled) and appends an edit - which is lost
> 
> Note that this fix requires a healthy dose of HDFS prerequisites: HDFS-617, HADOOP-6840, HADOOP-6886.  I encourage you to review those as well, give feedback, and hopefully give +1s so we can push the changes through.
> 
> 
> This addresses bug HBASE-2312.
>     http://issues.apache.org/jira/browse/HBASE-2312
> 
> 
> Diffs
> -----
> 
>   trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java 979953 
>   trunk/src/main/java/org/apache/hadoop/hbase/master/ProcessServerShutdown.java 979953 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 979953 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 979953 
>   trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java 979953 
> 
> Diff: http://review.cloudera.org/r/396/diff
> 
> 
> Testing
> -------
> 
> mvn test;
> bin/start-hbase.sh
> bin/hbase shell < scan '.META.', get, put, etc
> 
> 
> Thanks,
> 
> Nicolas
> 
>


Re: Review Request: HBASE-2312: Rename HLog Dir when Splitting

Posted by Todd Lipcon <to...@cloudera.com>.

> On 2010-07-28 10:17:14, Todd Lipcon wrote:
> > This looks good, except that it requires the HDFS patches to be in place to even compile. Although I think it's OK to not fix the bug in the case that we're on stock HDFS, I think we still need to be able to run, buggily.
> > 
> > Did we determine that all of the other solutions were flawed/too complicated?
> 
> Nicolas wrote:
>     HBASE-2312 had some lengthy discussions that ultimately led to this route.  For 0.21 + 0.22, only HADOOP-6840 is needed.  That is a very minor change consisting of only small API additions (so, no regression needed) that Dhruba & I think the lack of risk will make it easy to approve.  0.20-append needs a couple more JIRAs, but that should be even less flack.  As long as we ship with 0.20-append HDFS or newer, we'll be fine.
>     
>     I understand that this diff is early and we will have to wait until we ship with the 0.20-append JAR before application.  Basically, I also wanted to show that the API change indeed fixes our issue painlessly.  Any +1s on HADOOP-6840 would be appreciated :)  Your comment is that we should also backward-support 0.20.3?

Yea, I think we decided at one point that we should be able to run against a vanilla apache cluster, just that it would be "at your own risk" - ie that the bug fixes wouldn't necessarily work. EG this is why we do the reflection to check for the syncFs() method and warn in the case when it's not there, but continue to function.

In this patch, it would actually fail to work at all, since the RPC for non-recursive create would get an error at the NN.


- Todd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/396/#review513
-----------------------------------------------------------


On 2010-07-27 23:10:02, Nicolas wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.cloudera.org/r/396/
> -----------------------------------------------------------
> 
> (Updated 2010-07-27 23:10:02)
> 
> 
> Review request for hbase.
> 
> 
> Summary
> -------
> 
> There is a very corner case when bad things could happen(ie data loss):
> 
> 1) RS #1 is going to roll its HLog - not yet created the new one, old one will get no more writes
> 2) RS #1 enters GC Pause of Death
> 3) Master lists HLog files of RS#1 that is has to split as RS#1 is dead, starts splitting
> 4) RS #1 wakes up, created the new HLog (previous one was rolled) and appends an edit - which is lost
> 
> Note that this fix requires a healthy dose of HDFS prerequisites: HDFS-617, HADOOP-6840, HADOOP-6886.  I encourage you to review those as well, give feedback, and hopefully give +1s so we can push the changes through.
> 
> 
> This addresses bug HBASE-2312.
>     http://issues.apache.org/jira/browse/HBASE-2312
> 
> 
> Diffs
> -----
> 
>   trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java 979953 
>   trunk/src/main/java/org/apache/hadoop/hbase/master/ProcessServerShutdown.java 979953 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 979953 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 979953 
>   trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java 979953 
> 
> Diff: http://review.cloudera.org/r/396/diff
> 
> 
> Testing
> -------
> 
> mvn test;
> bin/start-hbase.sh
> bin/hbase shell < scan '.META.', get, put, etc
> 
> 
> Thanks,
> 
> Nicolas
> 
>