You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "nkeywal (JIRA)" <ji...@apache.org> on 2012/11/30 18:51:59 UTC

[jira] [Created] (HBASE-7247) Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening

nkeywal created HBASE-7247:
------------------------------

             Summary: Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening
                 Key: HBASE-7247
                 URL: https://issues.apache.org/jira/browse/HBASE-7247
             Project: HBase
          Issue Type: Improvement
          Components: master, Region Assignment, regionserver
    Affects Versions: 0.96.0
            Reporter: nkeywal
            Assignee: nkeywal
            Priority: Critical


The regionserver.OpenRegionHandler#tickleOpening updates the region znode as "Do this so master doesn't timeout this region-in-transition.".
However, on the usual test, this makes the assignment time of 1500 regions goes from 70s to 100s, that is, we're 50% slower because of this.
More generally, ZooKeper commits to disk all the data update, and this takes time. Using it to provide a keep alive seems overkill. At the very list, it could be made asynchronous.

I'm not sure how necessary these updates are required (I need to go deeper in the internal, feedback welcome), but it seems very important to optimize this... The trival fix would be to make this optional.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7247) Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507621#comment-13507621 ] 

stack commented on HBASE-7247:
------------------------------

bq. If one region server is opening a lot of regions, we just need one handler to tickle the opening. 

This would be a significant change for I'm-not-sure-what-benefit.  The zk transactions are region level/scoped and their handling is done at this level in open/close exec handlers.  Making it so RS does tracking and updating state for master to read regards CLOSING/OPENING would alter a bunch of code.

I'm all for a reexamination of base operations.  Stuff is this way because we would have issues where an open would stall for whatever reason... Master would intercept the open, take over the region and give it to someone else to open.  More often than not, we'd just fail again for same reason on the new location but the odd time the new re-attempt would succeed.

We could give up reattempt and just let everything hinge on whether a region server has a zk lease or not and let ServerShutdownHandler do it all.  It'd be a pretty radical difference.  Simplify code but also, in an odd case, it might mean we'd fail recover a region (I don't have stats on this).

bq. We used to have the 'owernership' issue as Stack mentioned. Now, I think we are fine since AM should have a consistent view of region states.

This is a similar type of leap-in-the-dark (smile).  I love the notion that AM is now rock solid.  It may be given the work expended (it certainly is a million times better) but I'd like us to run w/ the new AM a while in a few productions before making this ruling.

Again, if we could undo the OPENING/CLOSING, etc., stuff would be cleaner/simpler (logs would be way less noisy)


                
> Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening
> ------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7247
>                 URL: https://issues.apache.org/jira/browse/HBASE-7247
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, Region Assignment, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Critical
>
> The regionserver.OpenRegionHandler#tickleOpening updates the region znode as "Do this so master doesn't timeout this region-in-transition.".
> However, on the usual test, this makes the assignment time of 1500 regions goes from 70s to 100s, that is, we're 50% slower because of this.
> More generally, ZooKeper commits to disk all the data update, and this takes time. Using it to provide a keep alive seems overkill. At the very list, it could be made asynchronous.
> I'm not sure how necessary these updates are required (I need to go deeper in the internal, feedback welcome), but it seems very important to optimize this... The trival fix would be to make this optional.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (HBASE-7247) Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509789#comment-13509789 ] 

nkeywal edited comment on HBASE-7247 at 12/4/12 3:27 PM:
---------------------------------------------------------

bq. Do we not have this currently? If someone else changes the znode, we don't notice? We only notice when we go to update the znode?


I haven't found where where we do it. The result of tickleOpening is actually often ignored as well, or at least ignored for a long time. 
for example, in OpenRegionHandler#updateMeta(final HRegion r), we have 

{code}
tickleOpening = true(
while (/* condition, but not on tickleOpening */){
  // do something
  tickleOpening = tickleOpening();
}

return something && tickleOpening;
{code}

In theory, we should break the loop when tickleOpening becomes false.


While the way it's written, it seems that we can have a failure once, then a success.
Basically, it seems that tickleOpening is not always used as a check.

                
      was (Author: nkeywal):
    bq. Do we not have this currently? If someone else changes the znode, we don't notice? We only notice when we go to update the znode?


I haven't found where where we do it. The result of tickleOpening is actually often ignored as well, or at least ignored for a long time. 
for example, in OpenRegionHandler#updateMeta(final HRegion r), we have 

{code}
tickleOpening = true(
while (/* condition, but not on tickleOpening */){
  // do something
  tickleOpening = tickleOpening();
}

return something && tickleOpening;
{code}

In theory, we should break the loop when tickleOpening becomes false.


While the way it's written, it seems that we can have a failure once, then a success.
Basically, it s eems that tickleOpening is not always used as a check.

There is about the same code in OpenRegionHandler#process (i.e. an error in tickleOpening does not interrupt the process). 
                  
> Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening
> ------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7247
>                 URL: https://issues.apache.org/jira/browse/HBASE-7247
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, Region Assignment, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Critical
>
> The regionserver.OpenRegionHandler#tickleOpening updates the region znode as "Do this so master doesn't timeout this region-in-transition.".
> However, on the usual test, this makes the assignment time of 1500 regions goes from 70s to 100s, that is, we're 50% slower because of this.
> More generally, ZooKeper commits to disk all the data update, and this takes time. Using it to provide a keep alive seems overkill. At the very list, it could be made asynchronous.
> I'm not sure how necessary these updates are required (I need to go deeper in the internal, feedback welcome), but it seems very important to optimize this... The trival fix would be to make this optional.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7247) Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510135#comment-13510135 ] 

stack commented on HBASE-7247:
------------------------------

bq. In theory, we should break the loop when tickleOpening becomes false.

Yes.  That is a bug.
                
> Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening
> ------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7247
>                 URL: https://issues.apache.org/jira/browse/HBASE-7247
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, Region Assignment, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Critical
>
> The regionserver.OpenRegionHandler#tickleOpening updates the region znode as "Do this so master doesn't timeout this region-in-transition.".
> However, on the usual test, this makes the assignment time of 1500 regions goes from 70s to 100s, that is, we're 50% slower because of this.
> More generally, ZooKeper commits to disk all the data update, and this takes time. Using it to provide a keep alive seems overkill. At the very list, it could be made asynchronous.
> I'm not sure how necessary these updates are required (I need to go deeper in the internal, feedback welcome), but it seems very important to optimize this... The trival fix would be to make this optional.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7247) Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13511034#comment-13511034 ] 

Andrew Purtell commented on HBASE-7247:
---------------------------------------

{quote}
bq. Stepping back (after looking at code), could we drop the notion that a master can intercede and assign a region elsewhere because it is proceeding too slow on a particular region in the name of simplifying the region open handling interaction? There would be less noise in the logs and less states to deal with.
It would be a huge simplification imho. It's worth trying, I would say. It actually makes sense to do it now, because once the current trunk code will be production proven, touching it will be scarier.
{quote}

+1

                
> Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening
> ------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7247
>                 URL: https://issues.apache.org/jira/browse/HBASE-7247
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, Region Assignment, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Critical
>             Fix For: 0.96.0
>
>         Attachments: 7247.v1.patch
>
>
> The regionserver.OpenRegionHandler#tickleOpening updates the region znode as "Do this so master doesn't timeout this region-in-transition.".
> However, on the usual test, this makes the assignment time of 1500 regions goes from 70s to 100s, that is, we're 50% slower because of this.
> More generally, ZooKeper commits to disk all the data update, and this takes time. Using it to provide a keep alive seems overkill. At the very list, it could be made asynchronous.
> I'm not sure how necessary these updates are required (I need to go deeper in the internal, feedback welcome), but it seems very important to optimize this... The trival fix would be to make this optional.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7247) Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508916#comment-13508916 ] 

nkeywal commented on HBASE-7247:
--------------------------------

An possible solution could be:
- use a watcher to see if someone else is updating the znode
- use an asynchronous write to update the znode when we want to say to the master we're still there.

It mostly as of today, except that we suppressed the synchronous write on the regionserver path.
Pros: should be a small enough change.
Cons: we're still loading ZK (and the master as a side effect).

What do you think?
                
> Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening
> ------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7247
>                 URL: https://issues.apache.org/jira/browse/HBASE-7247
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, Region Assignment, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Critical
>
> The regionserver.OpenRegionHandler#tickleOpening updates the region znode as "Do this so master doesn't timeout this region-in-transition.".
> However, on the usual test, this makes the assignment time of 1500 regions goes from 70s to 100s, that is, we're 50% slower because of this.
> More generally, ZooKeper commits to disk all the data update, and this takes time. Using it to provide a keep alive seems overkill. At the very list, it could be made asynchronous.
> I'm not sure how necessary these updates are required (I need to go deeper in the internal, feedback welcome), but it seems very important to optimize this... The trival fix would be to make this optional.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7247) Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510452#comment-13510452 ] 

nkeywal commented on HBASE-7247:
--------------------------------

The patch is getting thinner and thinner...
Now I just check that the znode was not updated recently before writing it again. On the test, it brings half of was I have it I remove totally tickleOpening.

There is something there as well, in  OpenRegionHandler#process()

{code}
      boolean failed = true;
      if (tickleOpening("post_region_open")) {
        if (updateMeta(region)) {
          failed = false;
        }
      }
      if (failed || this.server.isStopped() ||
{code}

This 'tickleOpening' is called whatever the time spent previously, before updating meta. If we remove it completely, we save a sync.

                
> Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening
> ------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7247
>                 URL: https://issues.apache.org/jira/browse/HBASE-7247
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, Region Assignment, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Critical
>             Fix For: 0.96.0
>
>         Attachments: 7247.v1.patch
>
>
> The regionserver.OpenRegionHandler#tickleOpening updates the region znode as "Do this so master doesn't timeout this region-in-transition.".
> However, on the usual test, this makes the assignment time of 1500 regions goes from 70s to 100s, that is, we're 50% slower because of this.
> More generally, ZooKeper commits to disk all the data update, and this takes time. Using it to provide a keep alive seems overkill. At the very list, it could be made asynchronous.
> I'm not sure how necessary these updates are required (I need to go deeper in the internal, feedback welcome), but it seems very important to optimize this... The trival fix would be to make this optional.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7247) Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508939#comment-13508939 ] 

stack commented on HBASE-7247:
------------------------------

bq. use a watcher to see if someone else is updating the znode

Do we not have this currently?  If someone else changes the znode, we don't notice?  We only notice when we go to update the znode?

Should we list how many znode updates happen for a region open?

Async'ing the OPENINGs so master gets a tickle that the OPEN is progressing sounds fine... We don't actually move any files around on OPEN so its not a 'problem' if not the owner, really... Its only later after compaction, etc., that the damage is done... that is what we should prevent happening for sure.
                
> Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening
> ------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7247
>                 URL: https://issues.apache.org/jira/browse/HBASE-7247
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, Region Assignment, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Critical
>
> The regionserver.OpenRegionHandler#tickleOpening updates the region znode as "Do this so master doesn't timeout this region-in-transition.".
> However, on the usual test, this makes the assignment time of 1500 regions goes from 70s to 100s, that is, we're 50% slower because of this.
> More generally, ZooKeper commits to disk all the data update, and this takes time. Using it to provide a keep alive seems overkill. At the very list, it could be made asynchronous.
> I'm not sure how necessary these updates are required (I need to go deeper in the internal, feedback welcome), but it seems very important to optimize this... The trival fix would be to make this optional.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7247) Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507536#comment-13507536 ] 

stack commented on HBASE-7247:
------------------------------

Moving znode from OPENING to OPENING or CLOSING to CLOSING ensuring the sequence numbers are as we expect is how we check we still have 'ownership' of the region before we go about making alterations in the filesystem.  The callback the master gets when we 'update' the znode is used to flag the master that the region server is still 'alive' and working on the opening so the master updates its opening timer when it gets the callback.

It is imperfect in that we could lose the lease between the check and the fs operation.

Alternatives would be to remove this mechanism and instead just rely on our getting a callback if the znode is taken from us?  This would happen in another thread.  Would have to be watching.  This would seem to coarser than what we currently have widening the window during which we could do fs operations on a region though we've lost ownership.  Master would not get notification that a region server is opening a region only its taking longer than usual.

Any other suggestions?
                
> Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening
> ------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7247
>                 URL: https://issues.apache.org/jira/browse/HBASE-7247
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, Region Assignment, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Critical
>
> The regionserver.OpenRegionHandler#tickleOpening updates the region znode as "Do this so master doesn't timeout this region-in-transition.".
> However, on the usual test, this makes the assignment time of 1500 regions goes from 70s to 100s, that is, we're 50% slower because of this.
> More generally, ZooKeper commits to disk all the data update, and this takes time. Using it to provide a keep alive seems overkill. At the very list, it could be made asynchronous.
> I'm not sure how necessary these updates are required (I need to go deeper in the internal, feedback welcome), but it seems very important to optimize this... The trival fix would be to make this optional.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7247) Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507598#comment-13507598 ] 

stack commented on HBASE-7247:
------------------------------

bq. ...the callback would write a variable that would be tested by tickleOpening, and we're done. And as the test cost is cheap, we can test more often.

You mean you'd keep writing the znode, just async?

bq. May be the master could ask to the region server if it's ok? 

So master would ping after every region that is opening?  Would this be better/less costly than what we have now?

In past, before we moved assign to all-zk, on the region server heartbeat to the master, the RS would include the list of OPENING regions.
                
> Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening
> ------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7247
>                 URL: https://issues.apache.org/jira/browse/HBASE-7247
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, Region Assignment, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Critical
>
> The regionserver.OpenRegionHandler#tickleOpening updates the region znode as "Do this so master doesn't timeout this region-in-transition.".
> However, on the usual test, this makes the assignment time of 1500 regions goes from 70s to 100s, that is, we're 50% slower because of this.
> More generally, ZooKeper commits to disk all the data update, and this takes time. Using it to provide a keep alive seems overkill. At the very list, it could be made asynchronous.
> I'm not sure how necessary these updates are required (I need to go deeper in the internal, feedback welcome), but it seems very important to optimize this... The trival fix would be to make this optional.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7247) Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-7247:
---------------------------

    Fix Version/s: 0.96.0
           Status: Patch Available  (was: Open)
    
> Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening
> ------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7247
>                 URL: https://issues.apache.org/jira/browse/HBASE-7247
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, Region Assignment, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Critical
>             Fix For: 0.96.0
>
>         Attachments: 7247.v1.patch
>
>
> The regionserver.OpenRegionHandler#tickleOpening updates the region znode as "Do this so master doesn't timeout this region-in-transition.".
> However, on the usual test, this makes the assignment time of 1500 regions goes from 70s to 100s, that is, we're 50% slower because of this.
> More generally, ZooKeper commits to disk all the data update, and this takes time. Using it to provide a keep alive seems overkill. At the very list, it could be made asynchronous.
> I'm not sure how necessary these updates are required (I need to go deeper in the internal, feedback welcome), but it seems very important to optimize this... The trival fix would be to make this optional.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7247) Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510622#comment-13510622 ] 

stack commented on HBASE-7247:
------------------------------

Your approach sounds good nkeywal.  No harm writing less but adding the read in case we indeed have lost ownership.  It might mean it takes a bit longer to realize we've lost ownership but that should be fine.

On the 'post_region_open', IIRC, a bunch of these tickleOpenings were added because we saw issues... in this case, an update of .META. that went in though we'd lost ownership of the region.

Stepping back (after looking at code), could we drop the notion that a master can intercede and assign a region elsewhere because it is proceeding too slow on a particular region in the name of simplifying the region open handling interaction?  There would be less noise in the logs and less states to deal with.

If we did this, I'd think that we'd want the regionserver to do the initial move of the znode from OPENING to OPENING to establish ownership (elsewhere I have petitioned that the regionserver should set the OPENING state, and not the master -- master should set the state to PENDING_OPEN in the znode rather than just in master memory -- as a means of cleanly denoting the regionservers' assumption of region ownership).  Then the next transition would be from OPENING to OPEN or to FAILED_OPEN.  And that would be it.  Master would just presume that the only reason to intercede is when the regionserver loses its lease in zk.  We'd drop tickling OPENING so master knows we are progressing on a region open -- it would just presume regionserver is making progress and that it will kill itself if it can't get to HDFS, etc.  We'd also be dropping regionserver-side checks that it still owns a region just before it goes to update meta (Could change the meta operation to be a check and put so we didn't have to go to zk just before meta edit -- the bit of code you quote above nkeywal?).

Just putting it out there.

On patch:


Do we fail the open if the following break happens?

{code}
         tickleOpening = tickleOpening("post_open_deploy");
+        if (!tickleOpening) {
+          break;
+        }
{code}

We have to call the +    zkw.sync(node); ?  We always did that?  We are doing the sync just to read the old znode value?  Do we have to?  Could we operate w/ stale read?

Are you removing this:

-      HRegion region = this.rsServices.getFromOnlineRegions(encodedName);

If so, why or have you moved the check later?

                
> Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening
> ------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7247
>                 URL: https://issues.apache.org/jira/browse/HBASE-7247
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, Region Assignment, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Critical
>             Fix For: 0.96.0
>
>         Attachments: 7247.v1.patch
>
>
> The regionserver.OpenRegionHandler#tickleOpening updates the region znode as "Do this so master doesn't timeout this region-in-transition.".
> However, on the usual test, this makes the assignment time of 1500 regions goes from 70s to 100s, that is, we're 50% slower because of this.
> More generally, ZooKeper commits to disk all the data update, and this takes time. Using it to provide a keep alive seems overkill. At the very list, it could be made asynchronous.
> I'm not sure how necessary these updates are required (I need to go deeper in the internal, feedback welcome), but it seems very important to optimize this... The trival fix would be to make this optional.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7247) Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510712#comment-13510712 ] 

stack commented on HBASE-7247:
------------------------------

bq. Implicitly, it means we still have a race condition here, just that the probability is quite low.

Yeah.  By-product of our keeping state across multiple systems (up in zk and then some state in .meta.).  We could change this to a checkAndPut.  Read .META. at start of the opening or have master pass over the .META. timestamp or something key to .META. and we'd use it doing checkAndSet into .META. table... would be more strict than this updating zk.

bq. It would be a huge simplification imho. It's worth trying, I would say. It actually makes sense to do it now, because once the current trunk code will be production proven, touching it will be scarier.

I'd go along.  We should discuss out on dev first.  I have short-term memory.  I'm currently of the opinion that this expensive facility of master failing an open because it has been taking too long on a particular regionserver has been of no use -- worse, it has only caused headache -- but I may be just not remembering and others out on dev list will have better recall than I.


                
> Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening
> ------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7247
>                 URL: https://issues.apache.org/jira/browse/HBASE-7247
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, Region Assignment, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Critical
>             Fix For: 0.96.0
>
>         Attachments: 7247.v1.patch
>
>
> The regionserver.OpenRegionHandler#tickleOpening updates the region znode as "Do this so master doesn't timeout this region-in-transition.".
> However, on the usual test, this makes the assignment time of 1500 regions goes from 70s to 100s, that is, we're 50% slower because of this.
> More generally, ZooKeper commits to disk all the data update, and this takes time. Using it to provide a keep alive seems overkill. At the very list, it could be made asynchronous.
> I'm not sure how necessary these updates are required (I need to go deeper in the internal, feedback welcome), but it seems very important to optimize this... The trival fix would be to make this optional.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7247) Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509789#comment-13509789 ] 

nkeywal commented on HBASE-7247:
--------------------------------

bq. Do we not have this currently? If someone else changes the znode, we don't notice? We only notice when we go to update the znode?


I haven't found where where we do it. The result of tickleOpening is actually often ignored as well, or at least ignored for a long time. 
for example, in OpenRegionHandler#updateMeta(final HRegion r), we have 

{code}
tickleOpening = true(
while (/* condition, but not on tickleOpening */){
  // do something
  tickleOpening = tickleOpening();
}

return something && tickleOpening;
{code}

In theory, we should break the loop when tickleOpening becomes false.


While the way it's written, it seems that we can have a failure once, then a success.
Basically, it s eems that tickleOpening is not always used as a check.

There is about the same code in OpenRegionHandler#process (i.e. an error in tickleOpening does not interrupt the process). 
                
> Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening
> ------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7247
>                 URL: https://issues.apache.org/jira/browse/HBASE-7247
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, Region Assignment, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Critical
>
> The regionserver.OpenRegionHandler#tickleOpening updates the region znode as "Do this so master doesn't timeout this region-in-transition.".
> However, on the usual test, this makes the assignment time of 1500 regions goes from 70s to 100s, that is, we're 50% slower because of this.
> More generally, ZooKeper commits to disk all the data update, and this takes time. Using it to provide a keep alive seems overkill. At the very list, it could be made asynchronous.
> I'm not sure how necessary these updates are required (I need to go deeper in the internal, feedback welcome), but it seems very important to optimize this... The trival fix would be to make this optional.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7247) Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509968#comment-13509968 ] 

nkeywal commented on HBASE-7247:
--------------------------------

I'm a little bit scared with the watchers: I would not like to have race conditions (if the region is opened twice for example). I'm trying something with only the async part... 
                
> Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening
> ------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7247
>                 URL: https://issues.apache.org/jira/browse/HBASE-7247
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, Region Assignment, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Critical
>
> The regionserver.OpenRegionHandler#tickleOpening updates the region znode as "Do this so master doesn't timeout this region-in-transition.".
> However, on the usual test, this makes the assignment time of 1500 regions goes from 70s to 100s, that is, we're 50% slower because of this.
> More generally, ZooKeper commits to disk all the data update, and this takes time. Using it to provide a keep alive seems overkill. At the very list, it could be made asynchronous.
> I'm not sure how necessary these updates are required (I need to go deeper in the internal, feedback welcome), but it seems very important to optimize this... The trival fix would be to make this optional.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7247) Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507570#comment-13507570 ] 

nkeywal commented on HBASE-7247:
--------------------------------

For the regionserver side, the callback won't be very difficult to do I think: the callback would write a variable that would be tested by tickleOpening, and we're done. And as the test cost is cheap, we can test more often.

For master, hum... there are different options. May be the master could ask to the region server if it's ok? This would be done if there's no feedback after a while and if the regionserver is still alive from a zookeeper point if view? This would allow to test multiple regions simultaneously (all the regions of this regionserver).

                
> Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening
> ------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7247
>                 URL: https://issues.apache.org/jira/browse/HBASE-7247
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, Region Assignment, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Critical
>
> The regionserver.OpenRegionHandler#tickleOpening updates the region znode as "Do this so master doesn't timeout this region-in-transition.".
> However, on the usual test, this makes the assignment time of 1500 regions goes from 70s to 100s, that is, we're 50% slower because of this.
> More generally, ZooKeper commits to disk all the data update, and this takes time. Using it to provide a keep alive seems overkill. At the very list, it could be made asynchronous.
> I'm not sure how necessary these updates are required (I need to go deeper in the internal, feedback welcome), but it seems very important to optimize this... The trival fix would be to make this optional.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7247) Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510363#comment-13510363 ] 

nkeywal commented on HBASE-7247:
--------------------------------

I will fix it with the other stuff so. Currently my patch does not work locally, but I'm making progress.
                
> Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening
> ------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7247
>                 URL: https://issues.apache.org/jira/browse/HBASE-7247
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, Region Assignment, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Critical
>
> The regionserver.OpenRegionHandler#tickleOpening updates the region znode as "Do this so master doesn't timeout this region-in-transition.".
> However, on the usual test, this makes the assignment time of 1500 regions goes from 70s to 100s, that is, we're 50% slower because of this.
> More generally, ZooKeper commits to disk all the data update, and this takes time. Using it to provide a keep alive seems overkill. At the very list, it could be made asynchronous.
> I'm not sure how necessary these updates are required (I need to go deeper in the internal, feedback welcome), but it seems very important to optimize this... The trival fix would be to make this optional.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7247) Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509811#comment-13509811 ] 

ramkrishna.s.vasudevan commented on HBASE-7247:
-----------------------------------------------

Yes N.. You are right ...
                
> Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening
> ------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7247
>                 URL: https://issues.apache.org/jira/browse/HBASE-7247
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, Region Assignment, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Critical
>
> The regionserver.OpenRegionHandler#tickleOpening updates the region znode as "Do this so master doesn't timeout this region-in-transition.".
> However, on the usual test, this makes the assignment time of 1500 regions goes from 70s to 100s, that is, we're 50% slower because of this.
> More generally, ZooKeper commits to disk all the data update, and this takes time. Using it to provide a keep alive seems overkill. At the very list, it could be made asynchronous.
> I'm not sure how necessary these updates are required (I need to go deeper in the internal, feedback welcome), but it seems very important to optimize this... The trival fix would be to make this optional.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7247) Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510697#comment-13510697 ] 

nkeywal commented on HBASE-7247:
--------------------------------

bq. On the 'post_region_open', IIRC, a bunch of these tickleOpenings were added because we saw issues... in this case, an update of .META. that went in though we'd lost ownership of the region.
Implicitly, it means we still have a race condition here, just that the probability is quite low.

bq. Stepping back (after looking at code), could we drop the notion that a master can intercede and assign a region elsewhere because it is proceeding too slow on a particular region in the name of simplifying the region open handling interaction? There would be less noise in the logs and less states to deal with.
It would be a huge simplification imho. It's worth trying, I would say. It actually makes sense to do it now, because once the current trunk code will be production proven, touching it will be scarier.

bq. Do we fail the open if the following break happens?
Yes, because updateMeta will return false.

bq. We have to call the + zkw.sync(node); ? We always did that? We are doing the sync just to read the old znode value? Do we have to? Could we operate w/ stale read?
Well, we want to be sure that no one else wrote anything. It's often overkill, because we're going to write the znode immediately after, so the sync will occur anyway during the write, as we check the versions during the write. And it's expensive. So there is likely some room for improvement here as well actually.


bq. HRegion region = this.rsServices.getFromOnlineRegions(encodedName);
We don't do anything here actually. We do a get, if the region is in the list, 'region' will not be null and that's it. This variable is set later but not read in between. I can raise an error if it's in the list? Hopefully it won't break anything (famous last words)


                
> Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening
> ------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7247
>                 URL: https://issues.apache.org/jira/browse/HBASE-7247
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, Region Assignment, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Critical
>             Fix For: 0.96.0
>
>         Attachments: 7247.v1.patch
>
>
> The regionserver.OpenRegionHandler#tickleOpening updates the region znode as "Do this so master doesn't timeout this region-in-transition.".
> However, on the usual test, this makes the assignment time of 1500 regions goes from 70s to 100s, that is, we're 50% slower because of this.
> More generally, ZooKeper commits to disk all the data update, and this takes time. Using it to provide a keep alive seems overkill. At the very list, it could be made asynchronous.
> I'm not sure how necessary these updates are required (I need to go deeper in the internal, feedback welcome), but it seems very important to optimize this... The trival fix would be to make this optional.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7247) Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510389#comment-13510389 ] 

nkeywal commented on HBASE-7247:
--------------------------------

The issue comes from mixing asynchronous writes with synchronous writes & reads. My patch is now working but adds some complexity because of the added resynchronisation. I'm gonna do simpler. The ideal solution would be to be full async for the region opening (and I'm not gonna do that here :-) )
                
> Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening
> ------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7247
>                 URL: https://issues.apache.org/jira/browse/HBASE-7247
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, Region Assignment, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Critical
>
> The regionserver.OpenRegionHandler#tickleOpening updates the region znode as "Do this so master doesn't timeout this region-in-transition.".
> However, on the usual test, this makes the assignment time of 1500 regions goes from 70s to 100s, that is, we're 50% slower because of this.
> More generally, ZooKeper commits to disk all the data update, and this takes time. Using it to provide a keep alive seems overkill. At the very list, it could be made asynchronous.
> I'm not sure how necessary these updates are required (I need to go deeper in the internal, feedback welcome), but it seems very important to optimize this... The trival fix would be to make this optional.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7247) Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-7247:
---------------------------

    Attachment: 7247.v1.patch
    
> Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening
> ------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7247
>                 URL: https://issues.apache.org/jira/browse/HBASE-7247
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, Region Assignment, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Critical
>             Fix For: 0.96.0
>
>         Attachments: 7247.v1.patch
>
>
> The regionserver.OpenRegionHandler#tickleOpening updates the region znode as "Do this so master doesn't timeout this region-in-transition.".
> However, on the usual test, this makes the assignment time of 1500 regions goes from 70s to 100s, that is, we're 50% slower because of this.
> More generally, ZooKeper commits to disk all the data update, and this takes time. Using it to provide a keep alive seems overkill. At the very list, it could be made asynchronous.
> I'm not sure how necessary these updates are required (I need to go deeper in the internal, feedback welcome), but it seems very important to optimize this... The trival fix would be to make this optional.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7247) Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening

Posted by "Jimmy Xiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507607#comment-13507607 ] 

Jimmy Xiang commented on HBASE-7247:
------------------------------------

If one region server is opening a lot of regions, we just need one handler to tickle the opening.  Master just needs to know the region server is still running so that it doesn't time out the assignment.

To me, this is an overkill. We can combine this logic and something else, for example, to detect if a regionserver is dead.

We used to have the 'owernership' issue as Stack mentioned.  Now, I think we are fine since AM should have a consistent view of region states.
                
> Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening
> ------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7247
>                 URL: https://issues.apache.org/jira/browse/HBASE-7247
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, Region Assignment, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Critical
>
> The regionserver.OpenRegionHandler#tickleOpening updates the region znode as "Do this so master doesn't timeout this region-in-transition.".
> However, on the usual test, this makes the assignment time of 1500 regions goes from 70s to 100s, that is, we're 50% slower because of this.
> More generally, ZooKeper commits to disk all the data update, and this takes time. Using it to provide a keep alive seems overkill. At the very list, it could be made asynchronous.
> I'm not sure how necessary these updates are required (I need to go deeper in the internal, feedback welcome), but it seems very important to optimize this... The trival fix would be to make this optional.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira