You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "nkeywal (JIRA)" <ji...@apache.org> on 2012/04/25 21:14:17 UTC

[jira] [Created] (HBASE-5877) When a query fails because the region has moved, let the regionserver returns the new address to the client

nkeywal created HBASE-5877:
------------------------------

Summary: When a query fails because the region has moved, let the regionserver returns the new address to the client
Key: HBASE-5877
URL: https://issues.apache.org/jira/browse/HBASE-5877
Project: HBase
Issue Type: Improvement
Components: client, master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor

This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.

Note that a region is not immediately opened after a close. So:
- it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
- during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.

The implementation in the first patch does:
- on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
- the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
- the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
- the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13268199#comment-13268199 ] 

nkeywal commented on HBASE-5877:
--------------------------------

@stack

bq. You don't want to have RegionMovedException carry a ServerName#toString instead of host and port?
I think it's safer this way, as I have to parse the string afterward. Otherwise, if someone modifies ServerName#toString he will break the parsing in RegionMovedException, a class he may never have heard of (yes, it will break the unit test :-))

bq. Is this a bug fix?
Unfortunately, it's a feature. The error management is duplicated, and I have to manage both cases, because we don't have the exception when we come back to the result later in the code.

bq. Put the history of moved regions out into its own class?
You're right, it would be better. Wil do.

bq. Don't presize this I'd say: private static final long TIMEOUT_REGION_MOVED = (2L * 60L * 1000L);
You would prefer a configurable value? 


bq. Stuff is lazily cleared from movedRegions? Should we have a cleaner come visit occasionally?
Aggreed, it would be better with a cleaner. Will do as well.

bq. Why you say the above? When we protobuf it, it'll just be an option so it shouldn't be too bad?
Yeah, if it was too bad I would not have proposed it :-). It's an imperfection to accept I think. We would not have it if we share the regions state within the cluster with ZK.

@ted
bq. Under what condition would newHrl be null above ?
Oops. Refactoring error. Removed.

bq. Please remove the space between newHrl and ')' below:
Done.

bq. Would the above code result in NPE since I see the following in javadoc:
It should not happen because we test hrl value before. But I added a check on the arguments to make it safer.

bq. Since updateCachedLocations() is used to handle exception, the presizing above may not be needed.

bq. Since updateCachedLocations() is used to handle exception, the presizing above may not be needed.
Yeah, I sized it thinking: if we're doing a rolling restart we may have 100 regions with a wrong location if we're really unlucky. As it small, any solution would work here, but I prefer to have the size explicitly set, as it says "I though about it, that's a reasonable size". I added a comment however. 

bq. The indentation of CloseRegionHandler() above is off.
Fixed.


bq. 'will contains' -> 'will contain'. 'keep a too old' -> 'keep too old'.
Fixed.


                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877.v1.patch, 5877.v12.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271256#comment-13271256 ] 

Hadoop QA commented on HBASE-5877:
----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12526123/5877.v18.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    +1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1812//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1812//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1812//console

This message is automatically generated.
                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877-v16.txt, 5877-v17.txt, 5877-v17.txt, 5877.v1.patch, 5877.v12.patch, 5877.v15.patch, 5877.v18.patch, 5877.v18.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267841#comment-13267841 ] 

Zhihong Yu commented on HBASE-5877:
-----------------------------------

{code}
+        HRegionLocation newHrl = new HRegionLocation(hrl.getRegionInfo(), hostname, port);
+        if (newHrl == null ){
+          // Should not happened. Just in case.
{code}
Under what condition would newHrl be null above ?
Please remove the space between newHrl and ')' below:
{code}
+        cacheLocation(hrl.getRegionInfo().getTableName(), newHrl  );
{code}
For updateCachedLocations():
{code}
+        hrl : getCachedLocation(tableName, row.getRow()));
{code}
Would the above code result in NPE since I see the following in javadoc:
{code}
+     * @param row  - can be null if hrl is not null.
{code}

More review comments to follow.
                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877.v1.patch, 5877.v12.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13268778#comment-13268778 ] 

Hadoop QA commented on HBASE-5877:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12525684/5877-v17.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    +1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
     

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1775//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1775//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1775//console

This message is automatically generated.
                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877-v16.txt, 5877-v17.txt, 5877.v1.patch, 5877.v12.patch, 5877.v15.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5877:
---------------------------

    Attachment: 5877.v18.patch
    
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877-v16.txt, 5877-v17.txt, 5877-v17.txt, 5877.v1.patch, 5877.v12.patch, 5877.v15.patch, 5877.v18.patch, 5877.v18.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13262731#comment-13262731 ] 

Zhihong Yu commented on HBASE-5877:
-----------------------------------

@N:
If the testing result is favorable, I think Lars may want it in 0.94 as well.
I think making this feature functional in 0.94 cluster would be a good start.

bq. We could have this by adding the info in zk
A separate discussion should be started w.r.t. the above. This would shift load imposed by clients from master to zk quorum.
                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877.v1.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5877:
------------------------------

    Fix Version/s: 0.96.0
          Summary: When a query fails because the region has moved, let the regionserver return the new address to the client  (was: When a query fails because the region has moved, let the regionserver returns the new address to the client)
    
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877.v1.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271651#comment-13271651 ] 

nkeywal commented on HBASE-5877:
--------------------------------

bq. Now how does the updateCachelocation help here. If for some reason the opening of the region is not yet done and if the client gets RegionMovedException the client will try to contact the RS thinking the region got moved to it.

Yes, exactly. That's why I kept the sleep in the client code even for this RegionMoved. We could optimize this by adding a timestamps, with an heuristic like: "we give two seconds for the region to move after it's closed on the origin server". Sharing the region state in ZK would be a simpler option, as we would know if the region has moved or not.
                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877-v16.txt, 5877-v17.txt, 5877-v17.txt, 5877.v1.patch, 5877.v12.patch, 5877.v15.patch, 5877.v18.patch, 5877.v18.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5877:
---------------------------

    Status: Patch Available  (was: Open)
    
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877.v1.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13270913#comment-13270913 ] 

Hadoop QA commented on HBASE-5877:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12526047/5877.v18.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    +1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster
                  org.apache.hadoop.hbase.replication.TestReplication

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1801//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1801//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1801//console

This message is automatically generated.
                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877-v16.txt, 5877-v17.txt, 5877-v17.txt, 5877.v1.patch, 5877.v12.patch, 5877.v15.patch, 5877.v18.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13268701#comment-13268701 ] 

nkeywal commented on HBASE-5877:
--------------------------------

thank you Ted. It's strange, it though it worked locally. May be this test
was hung and was not reported as an error. Thank you anyway.



                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877-v16.txt, 5877.v1.patch, 5877.v12.patch, 5877.v15.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5877:
---------------------------

    Status: Patch Available  (was: Open)
    
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877.v1.patch, 5877.v12.patch, 5877.v15.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13262820#comment-13262820 ] 

stack commented on HBASE-5877:
------------------------------

@N This patch will benefit any move, not just rolling restart, right?

Here are a quick couple of comments on the patch.

I like addition of RegionMovedException.

Convention is to capitalize static defines: '+  private static final String hostField = "hostname=";' so it should be HOSTFIELD.

Its super ugly that you have to parse exception message to get exception data member fields... but thats not your fault.

Please keep the style of the surrounding code.  This kinda thing is unconventional:

{code}
+    private void updateCachedLocations(
+      Set<String> updateHistory,
+      HRegionLocation hrl,
+      Object t) {
{code}

Ugh on how ugly it is updating cache.  Again, not your fault.

Ted suggests updating interface version.  Maybe don't.  If you do, you can't get this into a 0.94.1, etc.

I don't see changes to make use of this new functionality?  I'd expect the balancer in master to make use of it?
                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877.v1.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13266656#comment-13266656 ] 

nkeywal commented on HBASE-5877:
--------------------------------

Generated with protobuf 2.4.1

Here are the things I'm not a big fan, but for which I don't have a better solution:
- the move management in the client code: I think it's possible to change the way we manage error (don't wait for all results before retrying), but that would be for another JIRA
- the destination is the closeRegion interface is a kind of interface hijacking. Other options would be:
    - sharing the region state in zookeeper
    - letting the regionserver calls the master to get the new server. On paper this would be more efficient than a client -> master call. In both cases we could consider that the client should not connect to the master except for cluster administration (create table, split regin; ...). That would increase global reliability. That's for another discussion as well I think.

Here is what I plan to do in the final version
- move the handler functional code into a function in HRegionServer: this would allow to have the function addToMovedRegion as private instead of public.
- Change all the CloseRegionHandler to take a RegionServer instead of a server? I'm not really keen on adding a class RegionServerServices, but may be I should?
- Manage the case when the destination is not specified at the beginning of the move (may be in a different Jira if it's not simple)...

All the previous comment should have been taken into account.
                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877.v1.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5877:
---------------------------

    Status: Patch Available  (was: Open)
    
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877-v16.txt, 5877-v17.txt, 5877-v17.txt, 5877.v1.patch, 5877.v12.patch, 5877.v15.patch, 5877.v18.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13268734#comment-13268734 ] 

Hadoop QA commented on HBASE-5877:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12525676/5877-v16.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    +1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.client.TestMultiParallel

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1773//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1773//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1773//console

This message is automatically generated.
                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877-v16.txt, 5877.v1.patch, 5877.v12.patch, 5877.v15.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13262462#comment-13262462 ] 

nkeywal commented on HBASE-5877:
--------------------------------

bq. Can we mark the failure and make this RegionMovedException behave the same as NotServingRegionException ?
Done.

bq. For updateCachedLocations(), please put explanation for parameter on the same line as the parameter:
Done.

bq. 'Failed all' -> 'Failed call'
It's an existing comment that we can find again later in the code. It really means "failed all": all the queries on this server failed. I don't mind changing it to something better, but I think we should keep the "all".

bq. 'which the server' -> 'which the region'
Done.

bq. Please increase the VERSION of HRegionInterface
Done.

bq. How is the server removed from cache since I see 'continue' above ?
That's what makes this code complex and difficult to change: the error is actually managed later, when we don't have the real exception anymore.

bq. For ServerManager.sendRegionClose(), please add javadoc for destServerName param.
Done.

bq. Is it possible that destServerName is null ?
Safety checks added.

bq. Please change the above to debug log. && Why is the above fatal (regionResult != null) ? Step 4 appears in a comment below the above code. Should the above say step 3 ?
Bad logs fixed.
                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877.v1.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13268678#comment-13268678 ] 

Zhihong Yu commented on HBASE-5877:
-----------------------------------

I can reproduce the TestMultiParallel failure:
{code}
testFlushCommitsNoAbort(org.apache.hadoop.hbase.client.TestMultiParallel)  Time elapsed: 1.36 sec  <<< ERROR!
java.lang.StackOverflowError
  at org.apache.hadoop.hbase.RegionMovedException.find(RegionMovedException.java:91)
  at org.apache.hadoop.hbase.RegionMovedException.find(RegionMovedException.java:108)
  at org.apache.hadoop.hbase.RegionMovedException.find(RegionMovedException.java:108)
  at org.apache.hadoop.hbase.RegionMovedException.find(RegionMovedException.java:108)
  at org.apache.hadoop.hbase.RegionMovedException.find(RegionMovedException.java:108)
  at org.apache.hadoop.hbase.RegionMovedException.find(RegionMovedException.java:108)
  at org.apache.hadoop.hbase.RegionMovedException.find(RegionMovedException.java:108)
{code}
                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877.v1.patch, 5877.v12.patch, 5877.v15.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273157#comment-13273157 ] 

ramkrishna.s.vasudevan commented on HBASE-5877:
-----------------------------------------------

The code is committed.  If the test failures are not related to this, you can resolve this issue N.
                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877-v16.txt, 5877-v17.txt, 5877-v17.txt, 5877.v1.patch, 5877.v12.patch, 5877.v15.patch, 5877.v18.patch, 5877.v18.patch, 5877.v18.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13266785#comment-13266785 ] 

Hadoop QA commented on HBASE-5877:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12525304/5877.v6.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.util.TestMergeTable
                  org.apache.hadoop.hbase.io.encoding.TestLoadAndSwitchEncodeOnDisk
                  org.apache.hadoop.hbase.client.TestMultiParallel
                  org.apache.hadoop.hbase.master.TestMasterFailover
                  org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster
                  org.apache.hadoop.hbase.master.TestZKBasedOpenCloseRegion
                  org.apache.hadoop.hbase.coprocessor.TestRowProcessorEndpoint
                  org.apache.hadoop.hbase.client.TestFromClientSide3
                  org.apache.hadoop.hbase.master.TestRollingRestart
                  org.apache.hadoop.hbase.util.TestMiniClusterLoadEncoded
                  org.apache.hadoop.hbase.avro.TestAvroServer
                  org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1724//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1724//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1724//console

This message is automatically generated.
                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877.v1.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13263140#comment-13263140 ] 

nkeywal commented on HBASE-5877:
--------------------------------

bq. This patch will benefit any move, not just rolling restart, right?
Yes, but as there is a wait time between two tries, I think the benefit will be minimal vs. the wait time for a single client. I could add an heuristic like if region was closed more than 2 seconds ago, consider that it's now available on the new server and don't sleep before the next retry. That could lead of having more network messages if the rule is wrong (and the rule will be wrong when the system is overloaded), and it will add some complexity to the client code. Having the real status of the region would solve this. 

Anyway, with the dev already done to cut the link between master & clients, it can help to save a reconnect to master. And during a rolling restart with regions moving everywhere, I think it will make a real difference.


bq. I don't see changes to make use of this new functionality? I'd expect the balancer in master to make use of it?
Yes, it's the changes in AssignmentManager: the changes are in the patch, but are quite small at the end: basically:
{noformat}
-    unassign(plan.getRegionInfo());
+    unassign(plan.getRegionInfo(), false, plan.getDestination());
{noformat}

I still need to manage the case when the destination is not specified at the beginning.

                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877.v1.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13270212#comment-13270212 ] 

nkeywal commented on HBASE-5877:
--------------------------------

I didn't find the test in error. Could be committed imho.
                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877-v16.txt, 5877-v17.txt, 5877.v1.patch, 5877.v12.patch, 5877.v15.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5877:
---------------------------

    Attachment: 5877.v18.patch
    
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877-v16.txt, 5877-v17.txt, 5877-v17.txt, 5877.v1.patch, 5877.v12.patch, 5877.v15.patch, 5877.v18.patch, 5877.v18.patch, 5877.v18.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13506935#comment-13506935 ] 

Jean-Daniel Cryans commented on HBASE-5877:
-------------------------------------------

Nicolas,

Do you think this log message could be removed?

{noformat}
12/11/29 15:17:36 INFO client.HConnectionManager$HConnectionImplementation: Region TestTable,0001966229,1354231005211.1bbba78dda968874d2981c322ed3319f. moved from 572ba57e-1cab-4f9c-a071-782e5a1a7184.cs1cloud.internal:60020, updating client location cache. New server: 20590793-0e19-4eb4-b2f6-05de8244f716.cs1cloud.internal:60020
{noformat}

Right now I'm running some loading tests and I'm getting walls of text every time a split happens and it's basically the same message repeated hundreds of times. We used to have a similar message before but we removed it since it's pretty spammy (or we set it to DEBUG, can't remember).
                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: Client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877.v12.patch, 5877.v15.patch, 5877-v16.txt, 5877-v17.txt, 5877-v17.txt, 5877.v18.patch, 5877.v18.patch, 5877.v18.patch, 5877.v1.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273137#comment-13273137 ] 

Hadoop QA commented on HBASE-5877:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12526501/5877.v18.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1845//console

This message is automatically generated.
                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877-v16.txt, 5877-v17.txt, 5877-v17.txt, 5877.v1.patch, 5877.v12.patch, 5877.v15.patch, 5877.v18.patch, 5877.v18.patch, 5877.v18.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5877:
---------------------------

    Status: Open  (was: Patch Available)
    
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877-v16.txt, 5877-v17.txt, 5877-v17.txt, 5877.v1.patch, 5877.v12.patch, 5877.v15.patch, 5877.v18.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-5877:
-------------------------

    Status: Patch Available  (was: Open)
    
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877-v16.txt, 5877-v17.txt, 5877-v17.txt, 5877.v1.patch, 5877.v12.patch, 5877.v15.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver returns the new address to the client

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13261972#comment-13261972 ] 

nkeywal commented on HBASE-5877:
--------------------------------

v1. On an old trunk, so it's just to give an overview. Includes some bits of HBASE-5844 as well.

There are 3 workarounds in the implementation:
1) As a ServerName is not serializable we use the String dedicated to this kind of issue. Acceptable I think.
2) hadoop.ipc serialization of exception is based on the #getMessage. So we have to parse it internally. It's not visisble to the exception user. Still acceptable (? :-)
3) The code to manage the error in the client package is quite complex. We have the exception at the very beginning, and then it's checked again, but we don't have the real exception anymore. I used a new "updateList" to make it works, I'm looking for another solution here...
                
> When a query fails because the region has moved, let the regionserver returns the new address to the client
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>         Attachments: 5877.v1.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267794#comment-13267794 ] 

Hadoop QA commented on HBASE-5877:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12525508/5877.v12.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    +1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.client.TestMultiParallel

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1749//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1749//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1749//console

This message is automatically generated.
                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877.v1.patch, 5877.v12.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-5877:
-------------------------

    Attachment: 5877-v17.txt

Retry
                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877-v16.txt, 5877-v17.txt, 5877-v17.txt, 5877.v1.patch, 5877.v12.patch, 5877.v15.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5877:
---------------------------

    Status: Open  (was: Patch Available)
    
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877.v1.patch, 5877.v12.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5877:
---------------------------

    Status: Open  (was: Patch Available)
    
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877.v1.patch, 5877.v12.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13270263#comment-13270263 ] 

nkeywal commented on HBASE-5877:
--------------------------------

Ok... I will do the merge and provide another patch today...
                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877-v16.txt, 5877-v17.txt, 5877-v17.txt, 5877.v1.patch, 5877.v12.patch, 5877.v15.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5877:
---------------------------

    Status: Open  (was: Patch Available)
    
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877-v16.txt, 5877-v17.txt, 5877-v17.txt, 5877.v1.patch, 5877.v12.patch, 5877.v15.patch, 5877.v18.patch, 5877.v18.patch, 5877.v18.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5877:
---------------------------

    Attachment: 5877.v12.patch
    
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877.v1.patch, 5877.v12.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5877:
---------------------------

    Status: Patch Available  (was: Open)
    
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877-v16.txt, 5877-v17.txt, 5877-v17.txt, 5877.v1.patch, 5877.v12.patch, 5877.v15.patch, 5877.v18.patch, 5877.v18.patch, 5877.v18.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13262199#comment-13262199 ] 

Zhihong Yu commented on HBASE-5877:
-----------------------------------

For RegionMovedException.java:
{code}
+    String tmpHostname = "nohostname";
{code}
I think the above could potentially be a host name :-)
{code}
+    } catch (Exception ignored) {
+      LOG.warn("Can't parse the hostname and the port from this string: " + s + ", "+
+        "Continuing");
+    }
{code}
Can we mark the failure and make this RegionMovedException behave the same as NotServingRegionException ?
For updateCachedLocations(), please put explanation for parameter on the same line as the parameter:
{code}
+     * @param row  - row and tableName can be null id hrl is not null.
{code}
{code}
+            LOG.warn("Failed all from " + loc, e);
{code}
'Failed all' -> 'Failed call'
{code}
+          if (resp == null) {
+            // Entire server failed
+            LOG.fatal("Failed all for server: " + loc.getHostnamePort() +
+              ", removing from cache");
+            continue;
+          }
{code}
How is the server removed from cache since I see 'continue' above ?
{code}
+              } else {
+                if (numRetries == 1)
+                  LOG.fatal("step 4 got result " + regionResult.getFirst() + " " + regionResult.getSecond());
{code}
Why is the above fatal (regionResult != null) ? Step 4 appears in a comment below the above code. Should the above say step 3 ?

Please increase the VERSION of HRegionInterface
{code}
+   * @param destServerName: server name on which the server will be moved
{code}
'which the server' -> 'which the region'

For ServerManager.sendRegionClose(), please add javadoc for destServerName param.
For HRegionServer.java:
{code}
+    LOG.info("Closing region "+region.getRegionName()+", moving to "+sn.getServerName() );
{code}
Is it possible that destServerName is null ?
{code}
+  private ServerName getMovedRegion(String encodedRegionName) {
+    LOG.fatal("Called getMovedRegion for "+encodedRegionName+" "+ movedRegions.size()+ " "+movedRegions);
{code}
Please change the above to debug log.

                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877.v1.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271603#comment-13271603 ] 

Zhihong Yu commented on HBASE-5877:
-----------------------------------

Integrated to trunk.

Thanks for the patch, N.

Thanks for the review, Stack.
                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877-v16.txt, 5877-v17.txt, 5877-v17.txt, 5877.v1.patch, 5877.v12.patch, 5877.v15.patch, 5877.v18.patch, 5877.v18.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271630#comment-13271630 ] 

ramkrishna.s.vasudevan commented on HBASE-5877:
-----------------------------------------------

Just saw the intent behind this JIRA. It is very important and a good feature.  You have done a lot in this but the JIRA says it is Minor :).

bq.It doesn't know if the region is opened yet. We could have this by adding the info in zk, but it would increase the zk load...
Now how does the updateCachelocation help here.  If for some reason the opening of the region is not yet done and if the client gets RegionMovedException the client will try to contact the RS thinking the region got moved to it.  
Thanks N.

                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877-v16.txt, 5877-v17.txt, 5877-v17.txt, 5877.v1.patch, 5877.v12.patch, 5877.v15.patch, 5877.v18.patch, 5877.v18.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5877:
------------------------------

    Attachment: 5877-v16.txt

Patch v16 is able to pass TestMultiParallel.

See the following link for the reason of infinite recursion:

http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-737/org/apache/hadoop/ipc/RemoteException.java#RemoteException.unwrapRemoteException%28%29
                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877-v16.txt, 5877.v1.patch, 5877.v12.patch, 5877.v15.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271656#comment-13271656 ] 

Hudson commented on HBASE-5877:
-------------------------------

Integrated in HBase-TRUNK #2858 (See [https://builds.apache.org/job/HBase-TRUNK/2858/])
    HBASE-5877 When a query fails because the region has moved, let the regionserver return the new address to the client (N Keywal) (Revision 1336301)

     Result = FAILURE
tedyu : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/RegionMovedException.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/AdminProtos.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/OnlineRegions.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRegionHandler.java
* /hbase/trunk/src/main/protobuf/Admin.proto
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/MockRegionServer.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java

                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877-v16.txt, 5877-v17.txt, 5877-v17.txt, 5877.v1.patch, 5877.v12.patch, 5877.v15.patch, 5877.v18.patch, 5877.v18.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5877) When a query fails because the region has moved, let the regionserver returns the new address to the client

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5877:
---------------------------

    Attachment: 5877.v1.patch
    
> When a query fails because the region has moved, let the regionserver returns the new address to the client
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>         Attachments: 5877.v1.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5877:
---------------------------

    Status: Patch Available  (was: Open)
    
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877.v1.patch, 5877.v12.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13272116#comment-13272116 ] 

Hudson commented on HBASE-5877:
-------------------------------

Integrated in HBase-TRUNK-security #197 (See [https://builds.apache.org/job/HBase-TRUNK-security/197/])
    HBASE-5877 When a query fails because the region has moved, let the regionserver return the new address to the client (N Keywal) (Revision 1336301)

     Result = FAILURE
tedyu : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/RegionMovedException.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/AdminProtos.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/OnlineRegions.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRegionHandler.java
* /hbase/trunk/src/main/protobuf/Admin.proto
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/MockRegionServer.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java

                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877-v16.txt, 5877-v17.txt, 5877-v17.txt, 5877.v1.patch, 5877.v12.patch, 5877.v15.patch, 5877.v18.patch, 5877.v18.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-5877:
-------------------------

    Status: Open  (was: Patch Available)
    
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877-v16.txt, 5877-v17.txt, 5877-v17.txt, 5877.v1.patch, 5877.v12.patch, 5877.v15.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13268649#comment-13268649 ] 

Hadoop QA commented on HBASE-5877:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12525660/5877.v15.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    +1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.client.TestMultiParallel

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1770//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1770//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1770//console

This message is automatically generated.
                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877.v1.patch, 5877.v12.patch, 5877.v15.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267787#comment-13267787 ] 

stack commented on HBASE-5877:
------------------------------

You don't want to have RegionMovedException carry a ServerName#toString instead of host and port?  Or it doesn't make sense when our cached region exceptions are keyed by hostname+port only?

Is this a bug fix?

{code}
@@ -1910,6 +1989,7 @@ public class HConnectionManager {
             }
           } catch (ExecutionException e) {
             LOG.debug("Failed all from " + loc, e);
+            updateCachedLocations(updateHistory, loc, e);
{code}

Put the history of moved regions out into its own class?

Don't presize this I'd say:

+  private static final long TIMEOUT_REGION_MOVED = (2L * 60L * 1000L);

Stuff is lazily cleared from movedRegions?  Should we have a cleaner come visit occasionally?

Patch looks fine to me.  Nice fat test.

bq. 5) The destination is the closeRegion interface is a kind of interface hijacking. Other options would be:

Why you say the above?  When we protobuf it, it'll just be an option so it shouldn't be too bad?

The HCM stuff is ugly but thats not your fault.


                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877.v1.patch, 5877.v12.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5877:
------------------------------

    Attachment: 5877-v17.txt

I forgot to add the check in one place.

Trying v17.
                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877-v16.txt, 5877-v17.txt, 5877.v1.patch, 5877.v12.patch, 5877.v15.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13270250#comment-13270250 ] 

Hadoop QA commented on HBASE-5877:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12525963/5877-v17.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1793//console

This message is automatically generated.
                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877-v16.txt, 5877-v17.txt, 5877-v17.txt, 5877.v1.patch, 5877.v12.patch, 5877.v15.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5877:
---------------------------

    Attachment: 5877.v15.patch
    
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877.v1.patch, 5877.v12.patch, 5877.v15.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5877:
---------------------------

    Status: Patch Available  (was: Open)
    
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877-v16.txt, 5877-v17.txt, 5877-v17.txt, 5877.v1.patch, 5877.v12.patch, 5877.v15.patch, 5877.v18.patch, 5877.v18.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267879#comment-13267879 ] 

Zhihong Yu commented on HBASE-5877:
-----------------------------------

{code}
+        final Set<String> updateHistory = new HashSet<String>(100);
{code}
Since updateCachedLocations() is used to handle exception, the presizing above may not be needed.
{code}
+    this(server, rsServices, regionInfo, abort, zk, versionOfClosingNode, eventType, null);
+  }
+
+    protected CloseRegionHandler(final Server server,
{code}
The indentation of CloseRegionHandler() above is off.
{code}
+  // This map will contains all the regions that we closed for a move.
+  //  We add the time it was moved as we don't want to keep a too old information
{code}
'will contains' -> 'will contain'. 'keep a too old' -> 'keep too old'.

This is a very useful feature. Good work, N.
                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877.v1.patch, 5877.v12.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267751#comment-13267751 ] 

nkeywal commented on HBASE-5877:
--------------------------------

v12, should be final.

1) ServerName is used everywhere in the interface, thanks to protobuf
2) hadoop.ipc serialization of exception is based on the #getMessage. So we have to parse it internally. It's not visisble to the exception user.
3) The code to manage the error in the client package is quite complex. We have the exception at the very beginning, and then it's checked again, but we don't have the real exception anymore. I used a new "historyList" to make it works. There is another JIRA for other improvement, in which I could get rid of this (HBASE-5924)
4) Generated with protobuf 2.4.1
5) The destination is the closeRegion interface is a kind of interface hijacking. Other options would be:
- sharing the region state in zookeeper
- letting the regionserver calls the master to get the new server. On paper this would be more efficient than a client -> master call. In both cases we could consider that the client should not connect to the master except for cluster administration (create table, split regin; ...). That would increase global reliability. That's for another discussion as well I think.
6) RegionServerServices has been modified to set a destination when removing a region from the online regions.
7) In another JIRA I will manage the case when the destination is not specified when calling the move function.
                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877.v1.patch, 5877.v12.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13262717#comment-13262717 ] 

nkeywal commented on HBASE-5877:
--------------------------------

Note that I'm currently rewriting the patch, as it conflicts with the protobuf stuff that was committed recently... But the logic hasn't changed.

@ted What we're saving in the current implementation is a call to the master. It can be interesting in itself if the region moves is used by a lot of clients. We could do better by letting the client know that the region is now fully available somewhere else and that there is no need to wait before retrying. But right now the region server only knows that the region is closed and moved to another server. It doesn't know if the region is opened yet. We could have this by adding the info in zk, but it would increase the zk load...
                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877.v1.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5877:
---------------------------

    Attachment: 5877.v18.patch

Hopefully I didn't break anything while doing the merge. Local tests are ok. I included Ted's fix.
                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877-v16.txt, 5877-v17.txt, 5877-v17.txt, 5877.v1.patch, 5877.v12.patch, 5877.v15.patch, 5877.v18.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5877:
------------------------------

    Comment: was deleted

(was: @N:
I think the following validation in real cluster would illustrate the benefit of this feature:
For given table, select a region server and note the row key ranges hosted by the region server. Direct client load to this server.
Kill the server at time T.

Difference in client response to region migration around time T with and without the patch would be interesting.)
    
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877.v1.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5877:
---------------------------

    Status: Open  (was: Patch Available)
    
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877-v16.txt, 5877-v17.txt, 5877-v17.txt, 5877.v1.patch, 5877.v12.patch, 5877.v15.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5877:
---------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)
    
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877-v16.txt, 5877-v17.txt, 5877-v17.txt, 5877.v1.patch, 5877.v12.patch, 5877.v15.patch, 5877.v18.patch, 5877.v18.patch, 5877.v18.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507245#comment-13507245 ] 

nkeywal commented on HBASE-5877:
--------------------------------

@jd Yep, you're right.
                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: Client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877.v12.patch, 5877.v15.patch, 5877-v16.txt, 5877-v17.txt, 5877-v17.txt, 5877.v18.patch, 5877.v18.patch, 5877.v18.patch, 5877.v1.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13266827#comment-13266827 ] 

nkeywal commented on HBASE-5877:
--------------------------------

I have the same locally, so it's likely my patch...
                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877.v1.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13262714#comment-13262714 ] 

Zhihong Yu commented on HBASE-5877:
-----------------------------------

@N:
I think the following validation in real cluster would illustrate the benefit of this feature:
For given table, select a region server and note the row key ranges hosted by one region on the region server. Direct client load to this region.
Issue the following command in shell: 
{code}
  hbase> move 'ENCODED_REGIONNAME', 'SERVER_NAME'
{code}
at time T.

Difference in client response to region migration around time T with and without the patch would be interesting.

                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877.v1.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13262710#comment-13262710 ] 

Zhihong Yu commented on HBASE-5877:
-----------------------------------

@N:
I think the following validation in real cluster would illustrate the benefit of this feature:
For given table, select a region server and note the row key ranges hosted by the region server. Direct client load to this server.
Kill the server at time T.

Difference in client response to region migration around time T with and without the patch would be interesting.
                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877.v1.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271590#comment-13271590 ] 

nkeywal commented on HBASE-5877:
--------------------------------

Seems to be ready for a commit :-]
                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877-v16.txt, 5877-v17.txt, 5877-v17.txt, 5877.v1.patch, 5877.v12.patch, 5877.v15.patch, 5877.v18.patch, 5877.v18.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5877:
---------------------------

    Attachment: 5877.v6.patch
    
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877.v1.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273151#comment-13273151 ] 

nkeywal commented on HBASE-5877:
--------------------------------

Strange, the status is "failure", but actually the code is available in the trunk (which is good, it saves another merge)
                
> When a query fails because the region has moved, let the regionserver return the new address to the client
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5877
>                 URL: https://issues.apache.org/jira/browse/HBASE-5877
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5877-v16.txt, 5877-v17.txt, 5877-v17.txt, 5877.v1.patch, 5877.v12.patch, 5877.v15.patch, 5877.v18.patch, 5877.v18.patch, 5877.v18.patch, 5877.v6.patch
>
>
> This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load.
> Note that a region is not immediately opened after a close. So:
> - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed.
> - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help.
> The implementation in the first patch does:
> - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region
> - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds.
> - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address.
> - the client analyses the exeptions and update its cache accordingly...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira