You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Jonathan Gray <jg...@apache.org> on 2010/10/21 01:44:06 UTC

Review Request: Stale reads from ZK can break the atomic CAS operations we have in ZKAssign

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1056/
-----------------------------------------------------------

Review request for hbase, Todd Lipcon and stack.


Summary
-------

Adds a sync(path) operation into ZKW and three calls into it from the CAS operations in ZKAssign.


This addresses bug HBASE-3136.
    http://issues.apache.org/jira/browse/HBASE-3136


Diffs
-----

  trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java 1025790 
  trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 1025790 

Diff: http://review.cloudera.org/r/1056/diff


Testing
-------

Still need to test more.  I'm not sure it's possible (or feasible in a reasonable amount of time) to make a unit test for this.  We'd probably need to dig into ZK or mock the hell out of stuff.


Thanks,

Jonathan


Re: Review Request: Stale reads from ZK can break the atomic CAS operations we have in ZKAssign

Posted by Jonathan Gray <jg...@apache.org>.

> On 2010-10-20 16:49:52, Todd Lipcon wrote:
> > seems OK, but we're adding couple extra ms of latency here on all of these calls. Is that going to be expensive for assigning lots of regions?
> > It seems we should be optimistic, and only really need to sync if we see unexpected state or the checked put fails?
> 
> Jonathan Gray wrote:
>     Yeah, gave that a quick shot.  It's not easy (the code gets messy quick so it needs to be well thought out).
>     
>     I'd like to commit this and we can open another jira to deal with the optimistic approach.
> 
> Todd Lipcon wrote:
>     OK, that sounds good by me. We'll fix the perf issue in a follow-up.
>     
>     Would still like Stack to review, I don't know the master code well enough to know if any other places might be missing, but I agree in concept :)

Cool.  I'm kicking off a test suite run with this patch and my hbck patch right now as well.

HBASE-3137 filed for perf improvements.


- Jonathan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1056/#review1590
-----------------------------------------------------------


On 2010-10-20 16:47:05, Jonathan Gray wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.cloudera.org/r/1056/
> -----------------------------------------------------------
> 
> (Updated 2010-10-20 16:47:05)
> 
> 
> Review request for hbase, Todd Lipcon and stack.
> 
> 
> Summary
> -------
> 
> Adds a sync(path) operation into ZKW and three calls into it from the CAS operations in ZKAssign.
> 
> 
> This addresses bug HBASE-3136.
>     http://issues.apache.org/jira/browse/HBASE-3136
> 
> 
> Diffs
> -----
> 
>   trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java 1025790 
>   trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 1025790 
> 
> Diff: http://review.cloudera.org/r/1056/diff
> 
> 
> Testing
> -------
> 
> Still need to test more.  I'm not sure it's possible (or feasible in a reasonable amount of time) to make a unit test for this.  We'd probably need to dig into ZK or mock the hell out of stuff.
> 
> 
> Thanks,
> 
> Jonathan
> 
>


Re: Review Request: Stale reads from ZK can break the atomic CAS operations we have in ZKAssign

Posted by Jonathan Gray <jg...@apache.org>.

> On 2010-10-20 16:49:52, Todd Lipcon wrote:
> > seems OK, but we're adding couple extra ms of latency here on all of these calls. Is that going to be expensive for assigning lots of regions?
> > It seems we should be optimistic, and only really need to sync if we see unexpected state or the checked put fails?

Yeah, gave that a quick shot.  It's not easy (the code gets messy quick so it needs to be well thought out).

I'd like to commit this and we can open another jira to deal with the optimistic approach.


- Jonathan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1056/#review1590
-----------------------------------------------------------


On 2010-10-20 16:47:05, Jonathan Gray wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.cloudera.org/r/1056/
> -----------------------------------------------------------
> 
> (Updated 2010-10-20 16:47:05)
> 
> 
> Review request for hbase, Todd Lipcon and stack.
> 
> 
> Summary
> -------
> 
> Adds a sync(path) operation into ZKW and three calls into it from the CAS operations in ZKAssign.
> 
> 
> This addresses bug HBASE-3136.
>     http://issues.apache.org/jira/browse/HBASE-3136
> 
> 
> Diffs
> -----
> 
>   trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java 1025790 
>   trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 1025790 
> 
> Diff: http://review.cloudera.org/r/1056/diff
> 
> 
> Testing
> -------
> 
> Still need to test more.  I'm not sure it's possible (or feasible in a reasonable amount of time) to make a unit test for this.  We'd probably need to dig into ZK or mock the hell out of stuff.
> 
> 
> Thanks,
> 
> Jonathan
> 
>


Re: Review Request: Stale reads from ZK can break the atomic CAS operations we have in ZKAssign

Posted by Todd Lipcon <to...@cloudera.com>.

> On 2010-10-20 16:49:52, Todd Lipcon wrote:
> > seems OK, but we're adding couple extra ms of latency here on all of these calls. Is that going to be expensive for assigning lots of regions?
> > It seems we should be optimistic, and only really need to sync if we see unexpected state or the checked put fails?
> 
> Jonathan Gray wrote:
>     Yeah, gave that a quick shot.  It's not easy (the code gets messy quick so it needs to be well thought out).
>     
>     I'd like to commit this and we can open another jira to deal with the optimistic approach.

OK, that sounds good by me. We'll fix the perf issue in a follow-up.

Would still like Stack to review, I don't know the master code well enough to know if any other places might be missing, but I agree in concept :)


- Todd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1056/#review1590
-----------------------------------------------------------


On 2010-10-20 16:47:05, Jonathan Gray wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.cloudera.org/r/1056/
> -----------------------------------------------------------
> 
> (Updated 2010-10-20 16:47:05)
> 
> 
> Review request for hbase, Todd Lipcon and stack.
> 
> 
> Summary
> -------
> 
> Adds a sync(path) operation into ZKW and three calls into it from the CAS operations in ZKAssign.
> 
> 
> This addresses bug HBASE-3136.
>     http://issues.apache.org/jira/browse/HBASE-3136
> 
> 
> Diffs
> -----
> 
>   trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java 1025790 
>   trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 1025790 
> 
> Diff: http://review.cloudera.org/r/1056/diff
> 
> 
> Testing
> -------
> 
> Still need to test more.  I'm not sure it's possible (or feasible in a reasonable amount of time) to make a unit test for this.  We'd probably need to dig into ZK or mock the hell out of stuff.
> 
> 
> Thanks,
> 
> Jonathan
> 
>


Re: Review Request: Stale reads from ZK can break the atomic CAS operations we have in ZKAssign

Posted by Todd Lipcon <to...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1056/#review1590
-----------------------------------------------------------


seems OK, but we're adding couple extra ms of latency here on all of these calls. Is that going to be expensive for assigning lots of regions?
It seems we should be optimistic, and only really need to sync if we see unexpected state or the checked put fails?

- Todd


On 2010-10-20 16:47:05, Jonathan Gray wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.cloudera.org/r/1056/
> -----------------------------------------------------------
> 
> (Updated 2010-10-20 16:47:05)
> 
> 
> Review request for hbase, Todd Lipcon and stack.
> 
> 
> Summary
> -------
> 
> Adds a sync(path) operation into ZKW and three calls into it from the CAS operations in ZKAssign.
> 
> 
> This addresses bug HBASE-3136.
>     http://issues.apache.org/jira/browse/HBASE-3136
> 
> 
> Diffs
> -----
> 
>   trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java 1025790 
>   trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 1025790 
> 
> Diff: http://review.cloudera.org/r/1056/diff
> 
> 
> Testing
> -------
> 
> Still need to test more.  I'm not sure it's possible (or feasible in a reasonable amount of time) to make a unit test for this.  We'd probably need to dig into ZK or mock the hell out of stuff.
> 
> 
> Thanks,
> 
> Jonathan
> 
>


Re: Review Request: Stale reads from ZK can break the atomic CAS operations we have in ZKAssign

Posted by st...@duboce.net.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1056/#review1620
-----------------------------------------------------------

Ship it!


+1 on getting it in as is.  I think its going to kill my bulk assign improvements but yesterday discussing w/ Jon and Karthik, there are plenty of options to explore yet... We'll do in another issue.

- stack


On 2010-10-20 16:47:05, Jonathan Gray wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.cloudera.org/r/1056/
> -----------------------------------------------------------
> 
> (Updated 2010-10-20 16:47:05)
> 
> 
> Review request for hbase, Todd Lipcon and stack.
> 
> 
> Summary
> -------
> 
> Adds a sync(path) operation into ZKW and three calls into it from the CAS operations in ZKAssign.
> 
> 
> This addresses bug HBASE-3136.
>     http://issues.apache.org/jira/browse/HBASE-3136
> 
> 
> Diffs
> -----
> 
>   trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java 1025790 
>   trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 1025790 
> 
> Diff: http://review.cloudera.org/r/1056/diff
> 
> 
> Testing
> -------
> 
> Still need to test more.  I'm not sure it's possible (or feasible in a reasonable amount of time) to make a unit test for this.  We'd probably need to dig into ZK or mock the hell out of stuff.
> 
> 
> Thanks,
> 
> Jonathan
> 
>


Re: Review Request: Stale reads from ZK can break the atomic CAS operations we have in ZKAssign

Posted by Jonathan Gray <jg...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1056/
-----------------------------------------------------------

(Updated 2010-10-20 16:47:05.987539)


Review request for hbase, Todd Lipcon and stack.


Changes
-------

Last patch didn't apply for some reason, made a new one.


Summary
-------

Adds a sync(path) operation into ZKW and three calls into it from the CAS operations in ZKAssign.


This addresses bug HBASE-3136.
    http://issues.apache.org/jira/browse/HBASE-3136


Diffs (updated)
-----

  trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java 1025790 
  trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 1025790 

Diff: http://review.cloudera.org/r/1056/diff


Testing
-------

Still need to test more.  I'm not sure it's possible (or feasible in a reasonable amount of time) to make a unit test for this.  We'd probably need to dig into ZK or mock the hell out of stuff.


Thanks,

Jonathan