You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2010/02/19 01:00:28 UTC

[jira] Created: (HBASE-2238) Review all transitions -- compactions, splits, region opens, log splitting -- for crash-proofyness and atomicity

Review all transitions -- compactions, splits, region opens, log splitting -- for crash-proofyness and atomicity
----------------------------------------------------------------------------------------------------------------

                 Key: HBASE-2238
                 URL: https://issues.apache.org/jira/browse/HBASE-2238
             Project: Hadoop HBase
          Issue Type: Bug
            Reporter: stack


This issue is about reviewing state transitions in hbase to ensure we're sufficently hardened against crashes.  This issue I see as an umbrella issue under which we'd look at compactions, splits, log splits, region opens -- what else is there?  We'd look at each in turn to see how we survive crash at any time during the transition.  For example, we think compactions idempotent but we need to prove it so.  Splits are for sure not, not at the moment (Witness disabled parents but daughters missing or only one of them available).

Part of this issue would be writing tests that aim to break transitions.

In light of above, here is recent off-list note from Todd Lipcon (and "another"):

{code}
I thought a bit more last night about the discussion we were having
regarding various HBase components doing operations on the HDFS data,
and ensuring that in various racy scenarios that we don't have two
region servers or masters overlapping.

I came to the conclusion that ZK data can't be used to actually have
effective locks on HDFS directories, since we can never know that we
still have a ZK lock when we do an operation. Thus the operations
themselves have to be idempotent, or recoverable in the case of
multiple nodes trying to do the same thing. Or, we have to use HDFS
itself as a locking mechanism - this is what we discussed using write
leases essentially as locks.

Since I didn't really trust myself, I ran my thoughts by "Another"
and he concurs (see
below). Figured this is food for thought for designing HBase data
management to be completely safe/correct.

...

---------- Forwarded message ----------
From: Another <an...@XXXXXX.com>
Date: Wed, Feb 17, 2010 at 10:50 AM
Subject: locks
To: Todd Lipcon <to...@XXXXXXX.com>


Short answer is no, you're right.
Because HDFS and ZK are partitioned (in the sense that there's no
communication between them) and there may be an unknown delay between
acquiring the lock and performing the operation on HDFS you have no
way of knowing that you still own the lock, like you say.
If the lock cannot be revoked while you have it (no timeouts) then you
can atomically check that you still have the lock and do the operation
on HDFS, because checking is a no-op. Designing a system with no lock
revocation in the face of failures is an exercise for the reader :)
The right way is for HDFS and ZK to communicate to construct an atomic
operation. ZK could give a token to the client which it also gives to
HDFS, and HDFS uses that token to do admission control. There's
probably some neat theorem about causality and the impossibility of
doing distributed locking without a sufficiently strong atomic
primitive here.

Another
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2238) Review all transitions -- compactions, splits, region opens, log splitting -- for crash-proofyness and atomicity

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835609#action_12835609 ] 

Todd Lipcon commented on HBASE-2238:
------------------------------------

Lastly, as an optimization, we can add a step 5 on the regionserver which is "log that the state transition is entirely complete". Thus the master knows it doesn't have to do anything with regards to this transition.

For discussion, it may be worth giving some terminology to the phases. It seems to me we have _prepare_ (enters the "will rollback" state), then _commit_ (enters the "will roll forward state"), then _complete_ (ends the state machine, no action necessary).

> Review all transitions -- compactions, splits, region opens, log splitting -- for crash-proofyness and atomicity
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2238
>                 URL: https://issues.apache.org/jira/browse/HBASE-2238
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>
> This issue is about reviewing state transitions in hbase to ensure we're sufficently hardened against crashes.  This issue I see as an umbrella issue under which we'd look at compactions, splits, log splits, region opens -- what else is there?  We'd look at each in turn to see how we survive crash at any time during the transition.  For example, we think compactions idempotent but we need to prove it so.  Splits are for sure not, not at the moment (Witness disabled parents but daughters missing or only one of them available).
> Part of this issue would be writing tests that aim to break transitions.
> In light of above, here is recent off-list note from Todd Lipcon (and "another"):
> {code}
> I thought a bit more last night about the discussion we were having
> regarding various HBase components doing operations on the HDFS data,
> and ensuring that in various racy scenarios that we don't have two
> region servers or masters overlapping.
> I came to the conclusion that ZK data can't be used to actually have
> effective locks on HDFS directories, since we can never know that we
> still have a ZK lock when we do an operation. Thus the operations
> themselves have to be idempotent, or recoverable in the case of
> multiple nodes trying to do the same thing. Or, we have to use HDFS
> itself as a locking mechanism - this is what we discussed using write
> leases essentially as locks.
> Since I didn't really trust myself, I ran my thoughts by "Another"
> and he concurs (see
> below). Figured this is food for thought for designing HBase data
> management to be completely safe/correct.
> ...
> ---------- Forwarded message ----------
> From: Another <an...@XXXXXX.com>
> Date: Wed, Feb 17, 2010 at 10:50 AM
> Subject: locks
> To: Todd Lipcon <to...@XXXXXXX.com>
> Short answer is no, you're right.
> Because HDFS and ZK are partitioned (in the sense that there's no
> communication between them) and there may be an unknown delay between
> acquiring the lock and performing the operation on HDFS you have no
> way of knowing that you still own the lock, like you say.
> If the lock cannot be revoked while you have it (no timeouts) then you
> can atomically check that you still have the lock and do the operation
> on HDFS, because checking is a no-op. Designing a system with no lock
> revocation in the face of failures is an exercise for the reader :)
> The right way is for HDFS and ZK to communicate to construct an atomic
> operation. ZK could give a token to the client which it also gives to
> HDFS, and HDFS uses that token to do admission control. There's
> probably some neat theorem about causality and the impossibility of
> doing distributed locking without a sufficiently strong atomic
> primitive here.
> Another
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2238) Review all transitions -- compactions, splits, region opens, log splitting -- for crash-proofyness and atomicity

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835607#action_12835607 ] 

Todd Lipcon commented on HBASE-2238:
------------------------------------

I think the general pattern that all transitions need to follow is:

1) HLog that the RS intends to do some operation
2) Perform the operation in a way that is still undoable (eg create compacted HFile but don't yet remove old ones)
3) HLog that the RS has finished the action
4) Clean up from part 2 (eg remove the pre-compaction HFiles)

We assume:
- Whenever a RS has failed, the master will open its HLog for append.
- This steals the write lease and increases the generation stamp on its last block.
- Thus the next time the RS attempts to hflush(), it will receive an IOException (I think a LeaseExpiredException to be specific?)

Failure cases at each step:

Fail before 1) no problem, data isn't touched
Fail after 1 but before 3) the transition is an indeterminate state. When the master recovers, it can roll back to the pre-transition state
Fail after 3) when the master recovers, it can complete the "cleanup" transition for the regionserver (even if the regionserver got halfway through cleanup)

This pattern relies on cleanup being idempotent, and state transitions being undoable.

The above examples are for the compaction case, but I think the same general ideas apply elsewhere.

> Review all transitions -- compactions, splits, region opens, log splitting -- for crash-proofyness and atomicity
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2238
>                 URL: https://issues.apache.org/jira/browse/HBASE-2238
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>
> This issue is about reviewing state transitions in hbase to ensure we're sufficently hardened against crashes.  This issue I see as an umbrella issue under which we'd look at compactions, splits, log splits, region opens -- what else is there?  We'd look at each in turn to see how we survive crash at any time during the transition.  For example, we think compactions idempotent but we need to prove it so.  Splits are for sure not, not at the moment (Witness disabled parents but daughters missing or only one of them available).
> Part of this issue would be writing tests that aim to break transitions.
> In light of above, here is recent off-list note from Todd Lipcon (and "another"):
> {code}
> I thought a bit more last night about the discussion we were having
> regarding various HBase components doing operations on the HDFS data,
> and ensuring that in various racy scenarios that we don't have two
> region servers or masters overlapping.
> I came to the conclusion that ZK data can't be used to actually have
> effective locks on HDFS directories, since we can never know that we
> still have a ZK lock when we do an operation. Thus the operations
> themselves have to be idempotent, or recoverable in the case of
> multiple nodes trying to do the same thing. Or, we have to use HDFS
> itself as a locking mechanism - this is what we discussed using write
> leases essentially as locks.
> Since I didn't really trust myself, I ran my thoughts by "Another"
> and he concurs (see
> below). Figured this is food for thought for designing HBase data
> management to be completely safe/correct.
> ...
> ---------- Forwarded message ----------
> From: Another <an...@XXXXXX.com>
> Date: Wed, Feb 17, 2010 at 10:50 AM
> Subject: locks
> To: Todd Lipcon <to...@XXXXXXX.com>
> Short answer is no, you're right.
> Because HDFS and ZK are partitioned (in the sense that there's no
> communication between them) and there may be an unknown delay between
> acquiring the lock and performing the operation on HDFS you have no
> way of knowing that you still own the lock, like you say.
> If the lock cannot be revoked while you have it (no timeouts) then you
> can atomically check that you still have the lock and do the operation
> on HDFS, because checking is a no-op. Designing a system with no lock
> revocation in the face of failures is an exercise for the reader :)
> The right way is for HDFS and ZK to communicate to construct an atomic
> operation. ZK could give a token to the client which it also gives to
> HDFS, and HDFS uses that token to do admission control. There's
> probably some neat theorem about causality and the impossibility of
> doing distributed locking without a sufficiently strong atomic
> primitive here.
> Another
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.