You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2011/08/29 06:10:38 UTC
[jira] [Created] (HBASE-4270) IOE ignored during flush-on-close
causes dataloss
IOE ignored during flush-on-close causes dataloss
-------------------------------------------------
Key: HBASE-4270
URL: https://issues.apache.org/jira/browse/HBASE-4270
Project: HBase
Issue Type: Bug
Components: regionserver
Affects Versions: 0.90.4, 0.92.0
Reporter: Todd Lipcon
Priority: Blocker
Fix For: 0.92.0
If the RS experiences an exception during the flush of a region while closing it, it currently catches the exception, logs a warning, and keeps going. If the exception was a DroppedSnapshotException, this means that it will silently drop any data that was in memstore when the region was closed.
Instead, the RS should do a hard abort so that its logs will be replayed.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4270) IOE ignored during flush-on-close
causes dataloss
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-4270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102132#comment-13102132 ]
Hudson commented on HBASE-4270:
-------------------------------
Integrated in HBase-TRUNK #2198 (See [https://builds.apache.org/job/HBase-TRUNK/2198/])
HBASE-4270 IOE ignored during flush-on-close causes dataloss
HBASE-4270 IOE ignored during flush-on-close causes dataloss
stack :
Files :
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/handler/MockRegionServerServices.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/handler/MockServer.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java
stack :
Files :
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRegionHandler.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestOpenRegionHandler.java
> IOE ignored during flush-on-close causes dataloss
> -------------------------------------------------
>
> Key: HBASE-4270
> URL: https://issues.apache.org/jira/browse/HBASE-4270
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 0.90.4, 0.92.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 0001-HBASE-4270.-Abort-and-rethrow-errors-on-close-failur.patch
>
>
> If the RS experiences an exception during the flush of a region while closing it, it currently catches the exception, logs a warning, and keeps going. If the exception was a DroppedSnapshotException, this means that it will silently drop any data that was in memstore when the region was closed.
> Instead, the RS should do a hard abort so that its logs will be replayed.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4270) IOE ignored during flush-on-close
causes dataloss
Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-4270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093044#comment-13093044 ]
Jean-Daniel Cryans commented on HBASE-4270:
-------------------------------------------
Ah I thought I read it was happening during abort. That's a pretty big hole then!
> IOE ignored during flush-on-close causes dataloss
> -------------------------------------------------
>
> Key: HBASE-4270
> URL: https://issues.apache.org/jira/browse/HBASE-4270
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 0.90.4, 0.92.0
> Reporter: Todd Lipcon
> Priority: Blocker
> Fix For: 0.92.0
>
>
> If the RS experiences an exception during the flush of a region while closing it, it currently catches the exception, logs a warning, and keeps going. If the exception was a DroppedSnapshotException, this means that it will silently drop any data that was in memstore when the region was closed.
> Instead, the RS should do a hard abort so that its logs will be replayed.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4270) IOE ignored during flush-on-close
causes dataloss
Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-4270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093033#comment-13093033 ]
Todd Lipcon commented on HBASE-4270:
------------------------------------
Without the server.abort() call, the logs don't get replayed. It just closes the region, and reports successful close in ZK, at which point some other server picks up the region without the data.
> IOE ignored during flush-on-close causes dataloss
> -------------------------------------------------
>
> Key: HBASE-4270
> URL: https://issues.apache.org/jira/browse/HBASE-4270
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 0.90.4, 0.92.0
> Reporter: Todd Lipcon
> Priority: Blocker
> Fix For: 0.92.0
>
>
> If the RS experiences an exception during the flush of a region while closing it, it currently catches the exception, logs a warning, and keeps going. If the exception was a DroppedSnapshotException, this means that it will silently drop any data that was in memstore when the region was closed.
> Instead, the RS should do a hard abort so that its logs will be replayed.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4270) IOE ignored during flush-on-close
causes dataloss
Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-4270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092968#comment-13092968 ]
Jean-Daniel Cryans commented on HBASE-4270:
-------------------------------------------
Like you say the logs are replayed, so how is that causing data loss?
> IOE ignored during flush-on-close causes dataloss
> -------------------------------------------------
>
> Key: HBASE-4270
> URL: https://issues.apache.org/jira/browse/HBASE-4270
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 0.90.4, 0.92.0
> Reporter: Todd Lipcon
> Priority: Blocker
> Fix For: 0.92.0
>
>
> If the RS experiences an exception during the flush of a region while closing it, it currently catches the exception, logs a warning, and keeps going. If the exception was a DroppedSnapshotException, this means that it will silently drop any data that was in memstore when the region was closed.
> Instead, the RS should do a hard abort so that its logs will be replayed.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4270) IOE ignored during flush-on-close
causes dataloss
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-4270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102091#comment-13102091 ]
stack commented on HBASE-4270:
------------------------------
@Lars Doesn't look like it. This patch is about how we keep going if an IOE on close of a region rather than abort the RS to get the WAL replayed. 4078 is about validating created storefiles post compaction.
> IOE ignored during flush-on-close causes dataloss
> -------------------------------------------------
>
> Key: HBASE-4270
> URL: https://issues.apache.org/jira/browse/HBASE-4270
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 0.90.4, 0.92.0
> Reporter: Todd Lipcon
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 0001-HBASE-4270.-Abort-and-rethrow-errors-on-close-failur.patch
>
>
> If the RS experiences an exception during the flush of a region while closing it, it currently catches the exception, logs a warning, and keeps going. If the exception was a DroppedSnapshotException, this means that it will silently drop any data that was in memstore when the region was closed.
> Instead, the RS should do a hard abort so that its logs will be replayed.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4270) IOE ignored during flush-on-close
causes dataloss
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-4270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-4270:
-------------------------
Status: Patch Available (was: Open)
Marking patch available.
> IOE ignored during flush-on-close causes dataloss
> -------------------------------------------------
>
> Key: HBASE-4270
> URL: https://issues.apache.org/jira/browse/HBASE-4270
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 0.90.4, 0.92.0
> Reporter: Todd Lipcon
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 0001-HBASE-4270.-Abort-and-rethrow-errors-on-close-failur.patch
>
>
> If the RS experiences an exception during the flush of a region while closing it, it currently catches the exception, logs a warning, and keeps going. If the exception was a DroppedSnapshotException, this means that it will silently drop any data that was in memstore when the region was closed.
> Instead, the RS should do a hard abort so that its logs will be replayed.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4270) IOE ignored during flush-on-close
causes dataloss
Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-4270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102079#comment-13102079 ]
Lars Hofhansl commented on HBASE-4270:
--------------------------------------
Is this related at all to HBASE-4078?
> IOE ignored during flush-on-close causes dataloss
> -------------------------------------------------
>
> Key: HBASE-4270
> URL: https://issues.apache.org/jira/browse/HBASE-4270
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 0.90.4, 0.92.0
> Reporter: Todd Lipcon
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 0001-HBASE-4270.-Abort-and-rethrow-errors-on-close-failur.patch
>
>
> If the RS experiences an exception during the flush of a region while closing it, it currently catches the exception, logs a warning, and keeps going. If the exception was a DroppedSnapshotException, this means that it will silently drop any data that was in memstore when the region was closed.
> Instead, the RS should do a hard abort so that its logs will be replayed.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4270) IOE ignored during flush-on-close
causes dataloss
Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-4270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102053#comment-13102053 ]
jiraposter@reviews.apache.org commented on HBASE-4270:
------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1784/#review1849
-----------------------------------------------------------
Ship it!
src/test/java/org/apache/hadoop/hbase/regionserver/handler/MockServer.java
<https://reviews.apache.org/r/1784/#comment4226>
Maybe this can be removed.
src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java
<https://reviews.apache.org/r/1784/#comment4227>
Nice.
- Ted
On 2011-09-09 23:08:12, Michael Stack wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/1784/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2011-09-09 23:08:12)
bq.
bq.
bq. Review request for hbase.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. Todd wrote the patch for this issue. Whats posted here is his patch plus a unit test. The diff is pretty big because I refactored the TestOpenRegionHandler so I could share bits of it creating this new TestCloseRegionHandler; the bulk of the patch is making shared mock server and shared mock regionserverservice files.
bq.
bq.
bq. This addresses bug hbase-4270.
bq. https://issues.apache.org/jira/browse/hbase-4270
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRegionHandler.java b684af2
bq. src/test/java/org/apache/hadoop/hbase/regionserver/handler/MockRegionServerServices.java PRE-CREATION
bq. src/test/java/org/apache/hadoop/hbase/regionserver/handler/MockServer.java PRE-CREATION
bq. src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java PRE-CREATION
bq. src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestOpenRegionHandler.java ab12968
bq.
bq. Diff: https://reviews.apache.org/r/1784/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. I ran the new TestCloseRegionHandler test.
bq.
bq.
bq. Thanks,
bq.
bq. Michael
bq.
bq.
> IOE ignored during flush-on-close causes dataloss
> -------------------------------------------------
>
> Key: HBASE-4270
> URL: https://issues.apache.org/jira/browse/HBASE-4270
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 0.90.4, 0.92.0
> Reporter: Todd Lipcon
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 0001-HBASE-4270.-Abort-and-rethrow-errors-on-close-failur.patch
>
>
> If the RS experiences an exception during the flush of a region while closing it, it currently catches the exception, logs a warning, and keeps going. If the exception was a DroppedSnapshotException, this means that it will silently drop any data that was in memstore when the region was closed.
> Instead, the RS should do a hard abort so that its logs will be replayed.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4270) IOE ignored during flush-on-close
causes dataloss
Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-4270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101630#comment-13101630 ]
jiraposter@reviews.apache.org commented on HBASE-4270:
------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1784/
-----------------------------------------------------------
Review request for hbase.
Summary
-------
Todd wrote the patch for this issue. Whats posted here is his patch plus a unit test. The diff is pretty big because I refactored the TestOpenRegionHandler so I could share bits of it creating this new TestCloseRegionHandler; the bulk of the patch is making shared mock server and shared mock regionserverservice files.
This addresses bug hbase-4270.
https://issues.apache.org/jira/browse/hbase-4270
Diffs
-----
src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRegionHandler.java b684af2
src/test/java/org/apache/hadoop/hbase/regionserver/handler/MockRegionServerServices.java PRE-CREATION
src/test/java/org/apache/hadoop/hbase/regionserver/handler/MockServer.java PRE-CREATION
src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java PRE-CREATION
src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestOpenRegionHandler.java ab12968
Diff: https://reviews.apache.org/r/1784/diff
Testing
-------
I ran the new TestCloseRegionHandler test.
Thanks,
Michael
> IOE ignored during flush-on-close causes dataloss
> -------------------------------------------------
>
> Key: HBASE-4270
> URL: https://issues.apache.org/jira/browse/HBASE-4270
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 0.90.4, 0.92.0
> Reporter: Todd Lipcon
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 0001-HBASE-4270.-Abort-and-rethrow-errors-on-close-failur.patch
>
>
> If the RS experiences an exception during the flush of a region while closing it, it currently catches the exception, logs a warning, and keeps going. If the exception was a DroppedSnapshotException, this means that it will silently drop any data that was in memstore when the region was closed.
> Instead, the RS should do a hard abort so that its logs will be replayed.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4270) IOE ignored during flush-on-close
causes dataloss
Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-4270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Todd Lipcon updated HBASE-4270:
-------------------------------
Attachment: 0001-HBASE-4270.-Abort-and-rethrow-errors-on-close-failur.patch
basic fix - i've verified this fixes the data loss. Needs a unit test.
> IOE ignored during flush-on-close causes dataloss
> -------------------------------------------------
>
> Key: HBASE-4270
> URL: https://issues.apache.org/jira/browse/HBASE-4270
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 0.90.4, 0.92.0
> Reporter: Todd Lipcon
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 0001-HBASE-4270.-Abort-and-rethrow-errors-on-close-failur.patch
>
>
> If the RS experiences an exception during the flush of a region while closing it, it currently catches the exception, logs a warning, and keeps going. If the exception was a DroppedSnapshotException, this means that it will silently drop any data that was in memstore when the region was closed.
> Instead, the RS should do a hard abort so that its logs will be replayed.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4270) IOE ignored during flush-on-close
causes dataloss
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-4270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-4270:
-------------------------
Resolution: Fixed
Assignee: Todd Lipcon
Hadoop Flags: [Reviewed]
Status: Resolved (was: Patch Available)
Committed to TRUNK. Committed the fix only to branch (no unit tests -- too hard to redo for 0.90 because of ServerName). Thanks for review Ted. Did your suggested remove on commit.
> IOE ignored during flush-on-close causes dataloss
> -------------------------------------------------
>
> Key: HBASE-4270
> URL: https://issues.apache.org/jira/browse/HBASE-4270
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 0.90.4, 0.92.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 0001-HBASE-4270.-Abort-and-rethrow-errors-on-close-failur.patch
>
>
> If the RS experiences an exception during the flush of a region while closing it, it currently catches the exception, logs a warning, and keeps going. If the exception was a DroppedSnapshotException, this means that it will silently drop any data that was in memstore when the region was closed.
> Instead, the RS should do a hard abort so that its logs will be replayed.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira