You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2010/06/15 03:57:14 UTC

[jira] Created: (HBASE-2729) flushCache should write to a tmp directory and then move into the store directory

flushCache should write to a tmp directory and then move into the store directory
---------------------------------------------------------------------------------

                 Key: HBASE-2729
                 URL: https://issues.apache.org/jira/browse/HBASE-2729
             Project: HBase
          Issue Type: Bug
          Components: regionserver
    Affects Versions: 0.21.0
            Reporter: Todd Lipcon
            Priority: Blocker


Currently it appears that internalFlushCache writes directly to the target spot of the flushed data. The finally() block appends the metadata and closes the file as if nothing bad went wrong in case of an exception. This is really bad, since it means that an IOE in the middle of flushing cache could easily write a valid looking file with only half the data, which would then prevent us from recovering those edits during log replay.

Instead, it should flush to a tmp location and move it into the region dir only after it's successfully written.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2729) flushCache should write to a tmp directory and then move into the store directory

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881575#action_12881575 ] 

stack commented on HBASE-2729:
------------------------------

The patch has ' "_tmp");' in it rather than '.tmp'.

We just going to let compaction dross pile up over life of a region?

-        doRegionCompactionCleanup();


Otherwise, patch looks good to me.  I'm +1 on commit.

> flushCache should write to a tmp directory and then move into the store directory
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-2729
>                 URL: https://issues.apache.org/jira/browse/HBASE-2729
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.21.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>         Attachments: hbase-2729.txt
>
>
> Currently it appears that internalFlushCache writes directly to the target spot of the flushed data. The finally() block appends the metadata and closes the file as if nothing bad went wrong in case of an exception. This is really bad, since it means that an IOE in the middle of flushing cache could easily write a valid looking file with only half the data, which would then prevent us from recovering those edits during log replay.
> Instead, it should flush to a tmp location and move it into the region dir only after it's successfully written.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HBASE-2729) flushCache should write to a tmp directory and then move into the store directory

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon resolved HBASE-2729.
--------------------------------

     Hadoop Flags: [Reviewed]
    Fix Version/s: 0.21.0
       Resolution: Fixed

Committed to 0.89.20100621 and trunk

> flushCache should write to a tmp directory and then move into the store directory
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-2729
>                 URL: https://issues.apache.org/jira/browse/HBASE-2729
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.21.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.21.0
>
>         Attachments: hbase-2729.txt, hbase-2729.txt
>
>
> Currently it appears that internalFlushCache writes directly to the target spot of the flushed data. The finally() block appends the metadata and closes the file as if nothing bad went wrong in case of an exception. This is really bad, since it means that an IOE in the middle of flushing cache could easily write a valid looking file with only half the data, which would then prevent us from recovering those edits during log replay.
> Instead, it should flush to a tmp location and move it into the region dir only after it's successfully written.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2729) flushCache should write to a tmp directory and then move into the store directory

Posted by "HBase Review Board (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881437#action_12881437 ] 

HBase Review Board commented on HBASE-2729:
-------------------------------------------

Message from: "Todd Lipcon" <to...@cloudera.com>

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/216/
-----------------------------------------------------------

(Updated 2010-06-22 15:55:16.514925)


Review request for hbase, stack and Ryan Rawson.


Changes
-------

TestMasterTransitions exposed a clunker of a bug here. In the previous iteration, I had put the tmp directory in /hbase/table/.tmp instead of /hbase/table/region/.tmp - I thought region.basedir was the region dir, not the table dir. So, when a new region was opened, it removed the tmp files being written by other region servers and made the world explode.

This iteration renames those variables to be more clear, and fixes the bug so the tmp dir is within the region, not the table.


Summary
-------

Fixes bugs where an exception in the middle of flushing a file leaves a half-written StoreFile in the region dir, preventing that region from recovering, or, in the case of transient errors, causing silent loss of half a file's worth of data.

I also got rid of the compaction dir here, and am just using one region-wide tmp dir. Is there some reason this is a bad idea?


This addresses bug HBASE-2729.
    http://issues.apache.org/jira/browse/HBASE-2729


Diffs (updated)
-----

  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1794df8 
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 04b7522 
  src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java dc38b3b 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java 9e5ca46 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java a65e947 

Diff: http://review.hbase.org/r/216/diff


Testing
-------

Ran TestCompaction and TestStore. Will start a cluster test running before I go to bed.


Thanks,

Todd




> flushCache should write to a tmp directory and then move into the store directory
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-2729
>                 URL: https://issues.apache.org/jira/browse/HBASE-2729
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.21.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>
> Currently it appears that internalFlushCache writes directly to the target spot of the flushed data. The finally() block appends the metadata and closes the file as if nothing bad went wrong in case of an exception. This is really bad, since it means that an IOE in the middle of flushing cache could easily write a valid looking file with only half the data, which would then prevent us from recovering those edits during log replay.
> Instead, it should flush to a tmp location and move it into the region dir only after it's successfully written.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2729) flushCache should write to a tmp directory and then move into the store directory

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881579#action_12881579 ] 

Todd Lipcon commented on HBASE-2729:
------------------------------------

bq. The patch has ' "_tmp");' in it rather than '.tmp'.

oops... let me double check this is the latest patch, I thought I fixed that. Will make sure it's right before commit.

bq. We just going to let compaction dross pile up over life of a region?

I don't think a compaction should leave data in tmp except if the compaction fails, right? Looking at my load test I'm running right now, the .tmp dirs are there but they're empty, even though compactions have run.

> flushCache should write to a tmp directory and then move into the store directory
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-2729
>                 URL: https://issues.apache.org/jira/browse/HBASE-2729
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.21.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>         Attachments: hbase-2729.txt
>
>
> Currently it appears that internalFlushCache writes directly to the target spot of the flushed data. The finally() block appends the metadata and closes the file as if nothing bad went wrong in case of an exception. This is really bad, since it means that an IOE in the middle of flushing cache could easily write a valid looking file with only half the data, which would then prevent us from recovering those edits during log replay.
> Instead, it should flush to a tmp location and move it into the region dir only after it's successfully written.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-2729) flushCache should write to a tmp directory and then move into the store directory

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated HBASE-2729:
-------------------------------

    Attachment: hbase-2729.txt

Aha, here is the real patch...

> flushCache should write to a tmp directory and then move into the store directory
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-2729
>                 URL: https://issues.apache.org/jira/browse/HBASE-2729
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.21.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>         Attachments: hbase-2729.txt, hbase-2729.txt
>
>
> Currently it appears that internalFlushCache writes directly to the target spot of the flushed data. The finally() block appends the metadata and closes the file as if nothing bad went wrong in case of an exception. This is really bad, since it means that an IOE in the middle of flushing cache could easily write a valid looking file with only half the data, which would then prevent us from recovering those edits during log replay.
> Instead, it should flush to a tmp location and move it into the region dir only after it's successfully written.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2729) flushCache should write to a tmp directory and then move into the store directory

Posted by "HBase Review Board (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881117#action_12881117 ] 

HBase Review Board commented on HBASE-2729:
-------------------------------------------

Message from: "Todd Lipcon" <to...@cloudera.com>

bq.  On 2010-06-22 00:14:05, stack wrote:
bq.  > +1 (if it passes all tests).  Nit-picks below.

Cool, I'll run it through my Hudson overnight, plus running a cluster test on my 5-node test cluster now. Will commit tomorrow midday with changes below addressed assuming testing goes OK.

bq.  On 2010-06-22 00:14:05, stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/Store.java, line 333
bq.  > <http://review.hbase.org/r/216/diff/1/?file=1529#file1529line333>
bq.  >
bq.  >     ? There was notion of a '_tmp' already?
bq.  >     
bq.  >     I'd say name it '.tmp'... since a '.' prefix seems to be our convention given logs dir at top-level has a '.' prefix.

yea, the bulk load stuff uses a tmp dir if you ask the regionserver to load a file which is stored on a different filesystem than the region itself.

I'll switch to .tmp though as you suggest (and move it into HConstants to replace the now-unused compactions.dir or what have you)

bq.  On 2010-06-22 00:14:05, stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/Store.java, line 464
bq.  > <http://review.hbase.org/r/216/diff/1/?file=1529#file1529line464>
bq.  >
bq.  >     Why not keep old name and just move dirs?  Why create a new unique name?

I was worried that what's unique in the old dir might not be unique in the new

bq.  On 2010-06-22 00:14:05, stack wrote:
bq.  > src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java, line 411
bq.  > <http://review.hbase.org/r/216/diff/1/?file=1531#file1531line411>
bq.  >
bq.  >     Nice test.

thx! at some point I'd like to combine this with the one in TestFSErrorsExposed and make a little mini-framework out of it... something for next month perhaps.

- Todd

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/216/#review261
-----------------------------------------------------------

> flushCache should write to a tmp directory and then move into the store directory
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-2729
>                 URL: https://issues.apache.org/jira/browse/HBASE-2729
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.21.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>
> Currently it appears that internalFlushCache writes directly to the target spot of the flushed data. The finally() block appends the metadata and closes the file as if nothing bad went wrong in case of an exception. This is really bad, since it means that an IOE in the middle of flushing cache could easily write a valid looking file with only half the data, which would then prevent us from recovering those edits during log replay.
> Instead, it should flush to a tmp location and move it into the region dir only after it's successfully written.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HBASE-2729) flushCache should write to a tmp directory and then move into the store directory

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon reassigned HBASE-2729:
----------------------------------

    Assignee: Todd Lipcon

> flushCache should write to a tmp directory and then move into the store directory
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-2729
>                 URL: https://issues.apache.org/jira/browse/HBASE-2729
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.21.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Blocker
>
> Currently it appears that internalFlushCache writes directly to the target spot of the flushed data. The finally() block appends the metadata and closes the file as if nothing bad went wrong in case of an exception. This is really bad, since it means that an IOE in the middle of flushing cache could easily write a valid looking file with only half the data, which would then prevent us from recovering those edits during log replay.
> Instead, it should flush to a tmp location and move it into the region dir only after it's successfully written.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2729) flushCache should write to a tmp directory and then move into the store directory

Posted by "HBase Review Board (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881104#action_12881104 ] 

HBase Review Board commented on HBASE-2729:
-------------------------------------------

Message from: "Todd Lipcon" <to...@cloudera.com>

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/216/
-----------------------------------------------------------

Review request for hbase, stack and Ryan Rawson.


Summary
-------

Fixes bugs where an exception in the middle of flushing a file leaves a half-written StoreFile in the region dir, preventing that region from recovering, or, in the case of transient errors, causing silent loss of half a file's worth of data.

I also got rid of the compaction dir here, and am just using one region-wide tmp dir. Is there some reason this is a bad idea?


This addresses bug HBASE-2729.
    http://issues.apache.org/jira/browse/HBASE-2729


Diffs
-----

  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1794df8 
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 04b7522 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java 9e5ca46 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java a65e947 

Diff: http://review.hbase.org/r/216/diff


Testing
-------

Ran TestCompaction and TestStore. Will start a cluster test running before I go to bed.


Thanks,

Todd




> flushCache should write to a tmp directory and then move into the store directory
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-2729
>                 URL: https://issues.apache.org/jira/browse/HBASE-2729
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.21.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>
> Currently it appears that internalFlushCache writes directly to the target spot of the flushed data. The finally() block appends the metadata and closes the file as if nothing bad went wrong in case of an exception. This is really bad, since it means that an IOE in the middle of flushing cache could easily write a valid looking file with only half the data, which would then prevent us from recovering those edits during log replay.
> Instead, it should flush to a tmp location and move it into the region dir only after it's successfully written.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HBASE-2729) flushCache should write to a tmp directory and then move into the store directory

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack reassigned HBASE-2729:
----------------------------

    Assignee: stack  (was: Todd Lipcon)

> flushCache should write to a tmp directory and then move into the store directory
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-2729
>                 URL: https://issues.apache.org/jira/browse/HBASE-2729
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.21.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>
> Currently it appears that internalFlushCache writes directly to the target spot of the flushed data. The finally() block appends the metadata and closes the file as if nothing bad went wrong in case of an exception. This is really bad, since it means that an IOE in the middle of flushing cache could easily write a valid looking file with only half the data, which would then prevent us from recovering those edits during log replay.
> Instead, it should flush to a tmp location and move it into the region dir only after it's successfully written.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-2729) flushCache should write to a tmp directory and then move into the store directory

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated HBASE-2729:
-------------------------------

    Attachment: hbase-2729.txt

Here's patch that was previously up on reviewboard (rb is down due to EC2 suckage)
One change on top of what was on reviewboard: had to update heap sizes for HRegion and Store to get tests to pass again.

> flushCache should write to a tmp directory and then move into the store directory
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-2729
>                 URL: https://issues.apache.org/jira/browse/HBASE-2729
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.21.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>         Attachments: hbase-2729.txt
>
>
> Currently it appears that internalFlushCache writes directly to the target spot of the flushed data. The finally() block appends the metadata and closes the file as if nothing bad went wrong in case of an exception. This is really bad, since it means that an IOE in the middle of flushing cache could easily write a valid looking file with only half the data, which would then prevent us from recovering those edits during log replay.
> Instead, it should flush to a tmp location and move it into the region dir only after it's successfully written.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2729) flushCache should write to a tmp directory and then move into the store directory

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881819#action_12881819 ] 

stack commented on HBASE-2729:
------------------------------

+1 (basedir -> tableDir is improvement)

> flushCache should write to a tmp directory and then move into the store directory
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-2729
>                 URL: https://issues.apache.org/jira/browse/HBASE-2729
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.21.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>         Attachments: hbase-2729.txt, hbase-2729.txt
>
>
> Currently it appears that internalFlushCache writes directly to the target spot of the flushed data. The finally() block appends the metadata and closes the file as if nothing bad went wrong in case of an exception. This is really bad, since it means that an IOE in the middle of flushing cache could easily write a valid looking file with only half the data, which would then prevent us from recovering those edits during log replay.
> Instead, it should flush to a tmp location and move it into the region dir only after it's successfully written.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2729) flushCache should write to a tmp directory and then move into the store directory

Posted by "HBase Review Board (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881113#action_12881113 ] 

HBase Review Board commented on HBASE-2729:
-------------------------------------------

Message from: stack@duboce.net

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/216/#review261
-----------------------------------------------------------

Ship it!


+1 (if it passes all tests).  Nit-picks below.


src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
<http://review.hbase.org/r/216/#comment1110>

    ? There was notion of a '_tmp' already?
    
    I'd say name it '.tmp'... since a '.' prefix seems to be our convention given logs dir at top-level has a '.' prefix.



src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
<http://review.hbase.org/r/216/#comment1111>

    Why not keep old name and just move dirs?  Why create a new unique name?



src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
<http://review.hbase.org/r/216/#comment1112>

    Nice test.


- stack





> flushCache should write to a tmp directory and then move into the store directory
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-2729
>                 URL: https://issues.apache.org/jira/browse/HBASE-2729
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.21.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>
> Currently it appears that internalFlushCache writes directly to the target spot of the flushed data. The finally() block appends the metadata and closes the file as if nothing bad went wrong in case of an exception. This is really bad, since it means that an IOE in the middle of flushing cache could easily write a valid looking file with only half the data, which would then prevent us from recovering those edits during log replay.
> Instead, it should flush to a tmp location and move it into the region dir only after it's successfully written.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.