You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "nkeywal (JIRA)" <ji...@apache.org> on 2012/05/26 01:57:23 UTC

[jira] [Created] (HBASE-6109) Improve RIT performances during assignment on large clusters

nkeywal created HBASE-6109:
------------------------------

             Summary: Improve RIT performances during assignment on large clusters
                 Key: HBASE-6109
                 URL: https://issues.apache.org/jira/browse/HBASE-6109
             Project: HBase
          Issue Type: Improvement
          Components: master
    Affects Versions: 0.96.0
            Reporter: nkeywal
            Assignee: nkeywal
            Priority: Minor


The main points in this patch are:
 - lowering the number of copy of the RIT list
 - lowering the number of synchronization
 - synchronizing on a region rather than on everything

It also contains:
 - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
 - some tests flakiness correction, actually unrelated to this patch.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286467#comment-13286467 ] 

nkeywal commented on HBASE-6109:
--------------------------------

@ram You're right, I forgot to remove my flakiness detector before doing the patch. Ok, I'm good for a v24 then. I will do it after looking at the test results for the v23...
                
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 6109.v19.patch, 6109.v21.patch, 6109.v23.patch, 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286544#comment-13286544 ] 

nkeywal commented on HBASE-6109:
--------------------------------

ok for commit imho.
                
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 6109.v19.patch, 6109.v21.patch, 6109.v23.patch, 6109.v24.patch, 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286503#comment-13286503 ] 

Hadoop QA commented on HBASE-6109:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12530368/6109.v23.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 26 new or modified tests.

    +1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.coprocessor.TestRowProcessorEndpoint
                  org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort
                  org.apache.hadoop.hbase.replication.TestReplication

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2075//testReport/
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2075//console

This message is automatically generated.
                
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 6109.v19.patch, 6109.v21.patch, 6109.v23.patch, 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285671#comment-13285671 ] 

nkeywal commented on HBASE-6109:
--------------------------------

I think it's ok for a commit.
>From the code I read, we should have the same behavior as before on split. I will write some parallel tests later on, but I would expect the same behavior as today at least. It may take time as I may encounter some flakiness on this path ;-).
I don't have a test class for NotifiableConcurrentSkipListMap, this class is small so I don't think it's an issue right now. I will push one with the other tests I will write.
                
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>         Attachments: 6109.v19.patch, 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-6109:
---------------------------

    Attachment: 6109.v19.patch
    
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>         Attachments: 6109.v19.patch, 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-6109:
------------------------------

    Fix Version/s: 0.96.0
    
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 6109.v19.patch, 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285876#comment-13285876 ] 

Hadoop QA commented on HBASE-6109:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12530237/6109.v21.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 26 new or modified tests.

    +1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.TestDrainingServer
                  org.apache.hadoop.hbase.master.TestAssignmentManager
                  org.apache.hadoop.hbase.security.access.TestZKPermissionsWatcher

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2056//testReport/
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2056//console

This message is automatically generated.
                
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 6109.v19.patch, 6109.v21.patch, 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286505#comment-13286505 ] 

nkeywal commented on HBASE-6109:
--------------------------------

Locally everything is ok and these tests are known as flaky, so I think it's ok. v24 is the version with the comments in TestAssignement.
                
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 6109.v19.patch, 6109.v21.patch, 6109.v23.patch, 6109.v24.patch, 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-6109:
---------------------------

    Attachment: 6109.v23.patch
    
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 6109.v19.patch, 6109.v21.patch, 6109.v23.patch, 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285828#comment-13285828 ] 

nkeywal commented on HBASE-6109:
--------------------------------

bq. Rename TestLocker class to TestKeyLocker ?
Done.

bq. 'shares with' -> 'shared with'
Done.

bq. Indentation in AssignmentManager.addToRITandCallClose() was off. It would be nice to correct the existing lines.
Done

bq. 'share synchronized' -> 'synchronized'. Remove the 'todo nli:' at the end.
Done

bq. Insert spaces around = sign.
Done.

bq. 'are in' -> 'is in'
Done

bq. Why not call the method clone() ?
We don't really want the NotifiableConcurrentSkipListMap to be cloneable: however, some functions want to work on a copy of the data structure, for reporting or test (with all the 'Map' semantic), hence the internal clone.

bq. Suppose delegatee is empty upon entry to the above method, what if an entry is added after the isEmpty() check ?
It will be equivalent to adding it just after the clear.

bq.  'A number' -> 'The number'
Done.

bq. 'number of people' -> 'number of users'
Done

bq. 'it's equals to zero.' -> 'it's equal to zero.'
Done

bq. The outer class is generic. The inner class shouldn't mention Region.
Done

                
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 6109.v19.patch, 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284843#comment-13284843 ] 

nkeywal commented on HBASE-6109:
--------------------------------

@stack

bq. Is this a generic locker? Should it be named for what its locking?
Renamed to LockerByString. If you have a better name...

bq. NotifiableConcurrentSkipListMap needs class comment. It seems like its for use in a very particular circumstance. It needs explaining.
done.

bq. Does it need to be public? Only used in master package? Perhaps make it package private then?

The issue was:
{noformat}
  public NotifiableConcurrentSkipListMap<String, RegionState> getRegionsInTransition() {
    return regionsInTransition;
  }
{noformat}

But it's used in tests only, so I can actually make both package protected. Done.

bq. internalList is a bad name for the internal delegate instance. Is 'delegatee' a better name than internalList?
done.


bq. We checked rit contains a name but then in a separate statement we do the waitForListUpdate? What if the region we are looking for is removed between the check and the waitForListUpdate invocation?
Actually yes, it could happen. I added a timeout, so we will now check every 100ms.


bq. Will this log be annoying?
Removed. I added them while debugging.

This one was already there however. I kept it.
{noformat}
  public void removeClosedRegion(HRegionInfo hri) {
    if (regionsToReopen.remove(hri.getEncodedName()) != null) {
      LOG.debug("Removed region from reopening regions because it was closed");
    }
  }
{noformat}


bq. Is this true / How is it enforced?
Oops, it not enforced (I don't know I could do it), but it's also not true: the update will set it as well. But it's not an issue as it's an atomic long. Comment updated.
It's btw tempting to:
 - change the implementation of updateTimestampToNow to use a lazySet
 - get the timestamp only once before looping on the region set.

I didn't do it in my patch, but I think it should be done. 

bq. needs space after curly parens. Sometimes you do it and sometimes you don't.
Done



> @ted

bq. It would be nice to have a test for NotifiableConcurrentSkipListMap.
Will do for final release.

bq. Since internalList is actually a Map, name the above method waitForUpdate() ?
Done.

bq. the above should read 'A utility class to manage a set of locks. Each lock is identified by a String which serves'
Done

bq. It should be Locker.class
Done

bq. The constant should be named NB_CONCURRENT_LOCKS.
Done

bq.The last word should be locked.
Done

bq. It would be nice to add more about reason.
Done.

bq. Looking at batchRemove() of http://www.docjar.com/html/api/java/util/ArrayList.java.html around line 669, I don't see synchronization. Meaning, existence check of elements from nodes in regionsInTransition.keySet() may not be deterministic.

After looking at the java api code, I don't think there is an issue here. The set we're using is documented as: "The view's iterator is a "weakly consistent" iterator that will never throw ConcurrentModificationException, and guarantees to traverse elements as they existed upon construction of the iterator, and may (but is not guaranteed to) reflect any modifications subsequent to construction.". So we won't have any java error. Then, if an element is added/removed to/from the RIT while we're doing the removeAll, it may be added/removed or not, but we're not less deterministic that we would be by adding a lock around the removeAll: the add/remove could be as well be done just before/after we take the lock, and we would not know it.



I'm currently checking how it works with split, then I will update it to the current trunk.
                
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>         Attachments: 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-6109:
---------------------------

    Attachment: 6109.v21.patch
    
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 6109.v19.patch, 6109.v21.patch, 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285229#comment-13285229 ] 

stack commented on HBASE-6109:
------------------------------

bq. Renamed to LockerByString. If you have a better name...

regionLocker?


                
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>         Attachments: 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-6109:
-------------------------

    Status: Patch Available  (was: Open)
    
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 6109.v19.patch, 6109.v21.patch, 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-6109:
------------------------------

    Attachment: 6049-v3.patch

Patch v3 rebased on trunk.
                
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 6109.v19.patch, 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285840#comment-13285840 ] 

Zhihong Yu commented on HBASE-6109:
-----------------------------------

@N:
Thanks for the quick turn-around.
                
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 6109.v19.patch, 6109.v21.patch, 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285909#comment-13285909 ] 

nkeywal commented on HBASE-6109:
--------------------------------

I need to have a look at this one. org.apache.hadoop.hbase.master.TestAssignmentManager.testRegionInOpeningStateOnDeadRSWhileMasterFailover 
                
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 6109.v19.patch, 6109.v21.patch, 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287037#comment-13287037 ] 

Hudson commented on HBASE-6109:
-------------------------------

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #35 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/35/])
    HBASE-6109 Improve RIT performances during assignment on large clusters (N Keywal) (Revision 1344802)

     Result = FAILURE
tedyu : 
Files : 
* /hbase/trunk/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/master/AssignmentManagerStatusTmpl.jamon
* /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MXBeanImpl.java
* /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterDumpServlet.java
* /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/NotifiableConcurrentSkipListMap.java
* /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/KeyLocker.java
* /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/TestDrainingServer.java
* /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java
* /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java
* /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/Mocking.java
* /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
* /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
* /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java
* /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterStatusServlet.java
* /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestKeyLocker.java

                
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 6109.v19.patch, 6109.v21.patch, 6109.v23.patch, 6109.v24.patch, 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-6109:
---------------------------

    Status: Open  (was: Patch Available)
    
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 6109.v19.patch, 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-6109:
---------------------------

    Attachment: 6109.v24.patch
    
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 6109.v19.patch, 6109.v21.patch, 6109.v23.patch, 6109.v24.patch, 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-6109:
---------------------------

    Status: Patch Available  (was: Open)
    
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 6109.v19.patch, 6109.v21.patch, 6109.v23.patch, 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286401#comment-13286401 ] 

nkeywal commented on HBASE-6109:
--------------------------------

testRegionInOpeningStateOnDeadRSWhileMasterFailover fails at this line:
{noformat}
  public void testRegionInOpeningStateOnDeadRSWhileMasterFailover() throws IOException,
      KeeperException, ServiceException, InterruptedException {
    AssignmentManagerWithExtrasForTesting am = setUpMockedAssignmentManager(this.server,
        this.serverManager);
    ZKAssign.createNodeOffline(this.watcher, REGIONINFO, SERVERNAME_A);   <============== FAILED HERE: KeeperErrorCode = NodeExists for /hbase/unassigned/5c7fe078551611acb0923a9ca0e1e1f4
{noformat}

So it's more a test error. This node should be deleted in the after() clause of the previous, for whatever reason it was not or was recreated after the delete. Investigating...
                
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 6109.v19.patch, 6109.v21.patch, 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284565#comment-13284565 ] 

stack commented on HBASE-6109:
------------------------------

@nkeywal Trunk should have settled now.  Can you redo your patch so its against the hbase root dir?

{code}+  final private Locker locker = new Locker();{code}

Is this a generic locker?  Should it be named for what its locking?

NotifiableConcurrentSkipListMap needs class comment.  It seems like its for use in a very particular circumstance.  It needs explaining.

Does it need to be public?  Only used in master package?   Perhaps make it package private then?

internalList is a bad name for the internal delegate instance.  Is 'delegatee' a better name than internalList?

For sure this is ok?

{code}
+    while (!this.master.isStopped() &&
+      // no lock concurrent access ok: sequentially consistent
+      this.regionsInTransition.containsKey(hri.getEncodedName())) {
+      this.regionsInTransition.waitForListUpdate();
     }
{code}

We checked rit contains a name but then in a separate statement we do the waitForListUpdate?  What if the region we are looking for is removed between the check and the waitForListUpdate invocation?

Will this log be annoying?

{code}
+      LOG.info("regionState=" + regionState +
+        " failoverProcessedRegions.containsKey(encodedRegionName)=" + failoverProcessedRegions.containsKey(encodedRegionName));
{code}

This too... '+      LOG.info("et=" + et);'?

.. and this '+        LOG.info("regionInfo.isMetaTable()=" + regionInfo.isMetaTable());'?

Add the region removed to the log message here? +      LOG.debug("Removed region from reopening regions because it was closed");?

Sometimes your indents are off.  For example:

{code}
-    synchronized (regionsInTransition) {
+      // We need a lock here as we're going to do a put later and we don't want multiple state
+      //  creation
+    Reentran....
{code}

There are gratuitious reformattings of code.

Is this true:

{code}
+      // no lock concurrency ok: there is a write when we update the timestamp but it's ok
+      //  as its the only one updating this field
+      RegionState rs = this.regionsInTransition.get(e.getKey());
{code}

How is it enforced?

Check these...

{code}

+    }finally {



 or here


+      }else {

{code}

needs space after curly parens.  Sometimes you do it and sometimes you don't.



I reviewed half of the patch.

It looks great.  Nice stuff N.
                
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>         Attachments: 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-6109:
------------------------------

    Attachment:     (was: 6049-v3.patch)
    
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 6109.v19.patch, 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13283841#comment-13283841 ] 

Zhihong Yu commented on HBASE-6109:
-----------------------------------

Looking forward to your patch, N.
                
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-6109:
---------------------------

    Status: Open  (was: Patch Available)
    
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 6109.v19.patch, 6109.v21.patch, 6109.v23.patch, 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285640#comment-13285640 ] 

Hadoop QA commented on HBASE-6109:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12530195/6109.v19.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 26 new or modified tests.

    +1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2052//testReport/
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2052//console

This message is automatically generated.
                
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>         Attachments: 6109.v19.patch, 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285821#comment-13285821 ] 

Zhihong Yu commented on HBASE-6109:
-----------------------------------

{code}
+  // A number of lock we want to easily support. It's not a maximum.
{code}
'A number' -> 'The number'
{code}
+  // We need an atomic counter to manage the number of people using the lock and free it when
+  //  it's equals to zero.
{code}
'number of people' -> 'number of users'
'it's equals to zero.' -> 'it's equal to zero.'
{code}
+  static class RegionLock<K> extends ReentrantLock {
{code}
The outer class is generic. The inner class shouldn't mention Region.
                
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 6109.v19.patch, 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286718#comment-13286718 ] 

Zhihong Yu commented on HBASE-6109:
-----------------------------------

In ZKUtil.java, the only change was:
{code}
+      LOG.debug("Deleting node "+node);
{code}
Can I take this out at time of integration ?
                
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 6109.v19.patch, 6109.v21.patch, 6109.v23.patch, 6109.v24.patch, 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-6109:
---------------------------

    Status: Patch Available  (was: Open)
    
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>         Attachments: 6109.v19.patch, 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286788#comment-13286788 ] 

Hudson commented on HBASE-6109:
-------------------------------

Integrated in HBase-TRUNK #2966 (See [https://builds.apache.org/job/HBase-TRUNK/2966/])
    HBASE-6109 Improve RIT performances during assignment on large clusters (N Keywal) (Revision 1344802)

     Result = FAILURE
tedyu : 
Files : 
* /hbase/trunk/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/master/AssignmentManagerStatusTmpl.jamon
* /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MXBeanImpl.java
* /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterDumpServlet.java
* /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/NotifiableConcurrentSkipListMap.java
* /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/KeyLocker.java
* /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/TestDrainingServer.java
* /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java
* /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java
* /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/Mocking.java
* /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
* /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
* /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java
* /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterStatusServlet.java
* /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestKeyLocker.java

                
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 6109.v19.patch, 6109.v21.patch, 6109.v23.patch, 6109.v24.patch, 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13283858#comment-13283858 ] 

nkeywal commented on HBASE-6109:
--------------------------------

Here it is. I haven't merged it with trunk, as I don't know yet the impact of the modules and I expect many commits the next few days :-).
                
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>         Attachments: 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286452#comment-13286452 ] 

nkeywal commented on HBASE-6109:
--------------------------------

Hopefully we're done :-)
v23 contains the fix for the issue above (unrelated to my changes) + the merge with the trunk as of now... I've executed locally 100 times TestAssignmentManager without getting any error.
                
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 6109.v19.patch, 6109.v21.patch, 6109.v23.patch, 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286725#comment-13286725 ] 

Zhihong Yu commented on HBASE-6109:
-----------------------------------

Integrated to trunk.

Thanks for the patch, N.

Thanks for the review, Stack and Ram.
                
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 6109.v19.patch, 6109.v21.patch, 6109.v23.patch, 6109.v24.patch, 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-6109:
---------------------------

    Status: Open  (was: Patch Available)
    
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 6109.v19.patch, 6109.v21.patch, 6109.v23.patch, 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286532#comment-13286532 ] 

Hadoop QA commented on HBASE-6109:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12530378/6109.v24.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 26 new or modified tests.

    +1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2076//testReport/
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2076//console

This message is automatically generated.
                
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 6109.v19.patch, 6109.v21.patch, 6109.v23.patch, 6109.v24.patch, 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-6109:
---------------------------

    Status: Patch Available  (was: Open)
    
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 6109.v19.patch, 6109.v21.patch, 6109.v23.patch, 6109.v24.patch, 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-6109:
---------------------------

    Attachment: 6109.v7.patch
    
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>         Attachments: 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-6109:
---------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)
    
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 6109.v19.patch, 6109.v21.patch, 6109.v23.patch, 6109.v24.patch, 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285914#comment-13285914 ] 

stack commented on HBASE-6109:
------------------------------

Thanks N.
                
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 6109.v19.patch, 6109.v21.patch, 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285809#comment-13285809 ] 

Zhihong Yu commented on HBASE-6109:
-----------------------------------

Rename TestLocker class to TestKeyLocker ?
{code}
+    // It has no reason to be a lock shares with the other operations.
{code}
'shares with' -> 'shared with'

Indentation in AssignmentManager.addToRITandCallClose() was off. It would be nice to correct the existing lines.
{code}
+    // No lock concurrency: adding a share synchronized here would not prevent to have two
+    //  entries as we don't check if the region is already there. This must be ensured by the
+    //  method callers. todo nli: check
{code}
'share synchronized' -> 'synchronized'. Remove the 'todo nli:' at the end.
{code}
-    Map<String, RegionPlan> plans=new HashMap<String, RegionPlan>();
+    Map<String, RegionPlan> plans=new HashMap<String, RegionPlan>(regions.size());
{code}
Insert spaces around = sign.
{code}
+   * @return True if none of the regions in the set are in transition
{code}
'are in' -> 'is in'
{code}
+  public NavigableMap<K, V> copyMap() {
+    return delegatee.clone();
{code}
Why not call the method clone() ?
{code}
+  public void clear() {
+    if (!delegatee.isEmpty()) {
+      synchronized (delegatee) {
{code}
Suppose delegatee is empty upon entry to the above method, what if an entry is added after the isEmpty() check ?
                
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 6109.v19.patch, 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284573#comment-13284573 ] 

Zhihong Yu commented on HBASE-6109:
-----------------------------------

It would be nice to have a test for NotifiableConcurrentSkipListMap.
{code}
+  public void waitListUpdate(long timeout) throws InterruptedException {
+    synchronized (internalList){
{code}
Since internalList is actually a Map, name the above method waitForUpdate() ?
{code}
+  public void clear() {
+    if (!internalList.isEmpty()) {
+      synchronized (internalList) {
{code}
Is it possible that internalList becomes empty after entering the synchronized block ?

For Locker,
{code}
+ * An utility class to manage a set of lock. Each lock is identified by a String who serves
{code}
the above should read 'A utility class to manage a set of locks. Each lock is identified by a String which serves'
{code}
+public class Locker {
+  private static final Log LOG = LogFactory.getLog(AssignmentManager.class);
{code}
It should be Locker.class
{code}
+  private static final int NB_CONCURRENT_LOCK = 1000;
{code}
The constant should be named NB_CONCURRENT_LOCKS.
{code}
+   * Return a lock for the given key. The lock is already lockek.
{code}
The last word should be locked.
{code}
+      String message = "Can't release the lock for " + key;
{code}
It would be nice to add more about reason.
{code}
-    synchronized (regionsInTransition) {
-      nodes.removeAll(regionsInTransition.keySet());
-    }
+    // no lock concurrent access ok: some threads may be adding/removing items but its java-valid
+    nodes.removeAll(regionsInTransition.keySet());
{code}
Looking at batchRemove() of http://www.docjar.com/html/api/java/util/ArrayList.java.html around line 669, I don't see synchronization.
Meaning, existence check of elements from nodes in regionsInTransition.keySet() may not be deterministic.

                
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>         Attachments: 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286463#comment-13286463 ] 

ramkrishna.s.vasudevan commented on HBASE-6109:
-----------------------------------------------

@N
Minor nits.
{code}
/*
       this.watcher.close();
+      try {
+        Thread.sleep(2000);
+      } catch (InterruptedException e) {
+      }  */
{code}
Some commented line is there. Just had a glance on the latest patch.  
                
> Improve RIT performances during assignment on large clusters
> ------------------------------------------------------------
>
>                 Key: HBASE-6109
>                 URL: https://issues.apache.org/jira/browse/HBASE-6109
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 6109.v19.patch, 6109.v21.patch, 6109.v23.patch, 6109.v7.patch
>
>
> The main points in this patch are:
>  - lowering the number of copy of the RIT list
>  - lowering the number of synchronization
>  - synchronizing on a region rather than on everything
> It also contains:
>  - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
>  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira