You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Sergey Shelukhin (JIRA)" <ji...@apache.org> on 2012/12/04 04:25:58 UTC

[jira] [Created] (HBASE-7268) correct local region location cache information can be overwritten w/stale information from an old server

Sergey Shelukhin created HBASE-7268:
---------------------------------------

             Summary: correct local region location cache information can be overwritten w/stale information from an old server
                 Key: HBASE-7268
                 URL: https://issues.apache.org/jira/browse/HBASE-7268
             Project: HBase
          Issue Type: Bug
    Affects Versions: 0.96.0
            Reporter: Sergey Shelukhin
            Assignee: Sergey Shelukhin
            Priority: Minor


Discovered via HBASE-7250; related to HBASE-5877.
Test is writing from multiple threads.
Server A has region R.
R gets moved from A to server B.
Server B gets killed.
Region gets moved by master to server C.
~15 seconds later, client tries to write to it.
Multiple client threads report "region moved from C to B", even though such transition never happened (neither in nor before the sequence described below).

I have a patch but not sure if it works, test still fails locally for yet unknown reason.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7268) correct local region location cache information can be overwritten w/stale information from an old server

Posted by "Sergey Shelukhin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510798#comment-13510798 ] 

Sergey Shelukhin commented on HBASE-7268:
-----------------------------------------

ok, I got repro... will attach patch after cleanup of debug logging/etc. 
I'd prefer to have TS in meta but this is a simpler fix for now.
The logging with patch looks like this:
{code}
2012-12-05 12:06:08,285 DEBUG [Thread-521] util.ChaosMonkey$Action(203): Removing 13 regions from 10.10.11.17,53406,1354737903944
...
2012-12-05 12:06:08,765 INFO  [am-zkevent-worker-pool-2-thread-2] master.RegionStates(249): Region {NAME =&gt; &apos;IntegrationTestRebalanceAndKillServersTargeted,7ffffff8,1354737916774.89483778064d05b1f2e1c0d20bcabc16.&apos;, STARTKEY =&gt; &apos;7ffffff8&apos;, ENDKEY =&gt; &apos;8cccccc4&apos;, ENCODED =&gt; 89483778064d05b1f2e1c0d20bcabc16,} transitioned from {IntegrationTestRebalanceAndKillServersTargeted,7ffffff8,1354737916774.89483778064d05b1f2e1c0d20bcabc16. state=PENDING_OPEN, ts=1354737968742, server=10.10.11.17,53407,1354737903960} to {IntegrationTestRebalanceAndKillServersTargeted,7ffffff8,1354737916774.89483778064d05b1f2e1c0d20bcabc16. state=OPENING, ts=1354737968765, server=10.10.11.17,53407,1354737903960}
...
2012-12-05 12:06:10,549 INFO  [Thread-521] util.ChaosMonkey$Action(179): Killing region server:10.10.11.17,53407,1354737903960
...
2012-12-05 12:06:39,233 INFO  [am-zkevent-worker-pool-2-thread-2] master.RegionStates(249): Region {NAME =&gt; &apos;IntegrationTestRebalanceAndKillServersTargeted,7ffffff8,1354737916774.89483778064d05b1f2e1c0d20bcabc16.&apos;, STARTKEY =&gt; &apos;7ffffff8&apos;, ENDKEY =&gt; &apos;8cccccc4&apos;, ENCODED =&gt; 89483778064d05b1f2e1c0d20bcabc16,} transitioned from {IntegrationTestRebalanceAndKillServersTargeted,7ffffff8,1354737916774.89483778064d05b1f2e1c0d20bcabc16. state=OPENING, ts=1354737999228, server=10.10.11.17,53404,1354737903902} to {IntegrationTestRebalanceAndKillServersTargeted,7ffffff8,1354737916774.89483778064d05b1f2e1c0d20bcabc16. state=OPEN, ts=1354737999232, server=10.10.11.17,53404,1354737903902}
...
2012-12-05 12:06:40,276 INFO  [HBaseWriterThread_4] client.HConnectionManager$HConnectionImplementation(1776): Received an error from 10.10.11.17:53407 for region IntegrationTestRebalanceAndKillServersTargeted,7ffffff8,1354737916774.89483778064d05b1f2e1c0d20bcabc16.; not removing 10.10.11.17:53404 from cache.
...
2012-12-05 12:06:40,381 INFO  [HBaseWriterThread_15] client.HConnectionManager$HConnectionImplementation(1809): Region IntegrationTestRebalanceAndKillServersTargeted,7ffffff8,1354737916774.89483778064d05b1f2e1c0d20bcabc16. moved to 10.10.11.17:53407 according to 10.10.11.17:53406
2012-12-05 12:06:40,381 DEBUG [HBaseWriterThread_15] client.HConnectionManager$HConnectionImplementation(1342): Ignoring stale location update for IntegrationTestRebalanceAndKillServersTargeted,7ffffff8,1354737916774.89483778064d05b1f2e1c0d20bcabc16.: 10.10.11.17:53407 at 1354737968725; local 10.10.11.17:53404 at 1354738000265
{code}
                
> correct local region location cache information can be overwritten w/stale information from an old server
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7268
>                 URL: https://issues.apache.org/jira/browse/HBASE-7268
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.96.0
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Minor
>
> Discovered via HBASE-7250; related to HBASE-5877.
> Test is writing from multiple threads.
> Server A has region R; client knows that.
> R gets moved from A to server B.
> B gets killed.
> R gets moved by master to server C.
> ~15 seconds later, client tries to write to it (on A?).
> Multiple client threads report from RegionMoved exception processing logic "R moved from C to B", even though such transition never happened (neither in nor before the sequence described below). Not quite sure how the client learned of the transition to C, I assume it's from meta from some other thread...
> Then, put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding).
> I have a patch but not sure if it works, test still fails locally for yet unknown reason.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7268) correct local region location cache information can be overwritten w/stale information from an old server

Posted by "Sergey Shelukhin (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sergey Shelukhin updated HBASE-7268:
------------------------------------

    Description: 
Discovered via HBASE-7250; related to HBASE-5877.
Test is writing from multiple threads.
Server A has region R; client knows that.
R gets moved from A to server B.
B gets killed.
R gets moved by master to server C.
~15 seconds later, client tries to write to it (on A?).
Multiple client threads report from RegionMoved exception processing logic "R moved from C to B", even though such transition never happened (neither in nor before the sequence described below), then put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding).

I have a patch but not sure if it works, test still fails locally for yet unknown reason.

  was:
Discovered via HBASE-7250; related to HBASE-5877.
Test is writing from multiple threads.
Server A has region R.
R gets moved from A to server B.
Server B gets killed.
Region gets moved by master to server C.
~15 seconds later, client tries to write to it.
Multiple client threads report "region moved from C to B", even though such transition never happened (neither in nor before the sequence described below).

I have a patch but not sure if it works, test still fails locally for yet unknown reason.

    
> correct local region location cache information can be overwritten w/stale information from an old server
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7268
>                 URL: https://issues.apache.org/jira/browse/HBASE-7268
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.96.0
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Minor
>
> Discovered via HBASE-7250; related to HBASE-5877.
> Test is writing from multiple threads.
> Server A has region R; client knows that.
> R gets moved from A to server B.
> B gets killed.
> R gets moved by master to server C.
> ~15 seconds later, client tries to write to it (on A?).
> Multiple client threads report from RegionMoved exception processing logic "R moved from C to B", even though such transition never happened (neither in nor before the sequence described below), then put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding).
> I have a patch but not sure if it works, test still fails locally for yet unknown reason.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7268) correct local region location cache information can be overwritten w/stale information from an old server

Posted by "Sergey Shelukhin (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sergey Shelukhin updated HBASE-7268:
------------------------------------

    Attachment: HBASE-7268-v0.patch

Apparently meta TS is set. Using it. Also made some logging more helpful.
                
> correct local region location cache information can be overwritten w/stale information from an old server
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7268
>                 URL: https://issues.apache.org/jira/browse/HBASE-7268
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.96.0
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Minor
>         Attachments: HBASE-7268-v0.patch
>
>
> Discovered via HBASE-7250; related to HBASE-5877.
> Test is writing from multiple threads.
> Server A has region R; client knows that.
> R gets moved from A to server B.
> B gets killed.
> R gets moved by master to server C.
> ~15 seconds later, client tries to write to it (on A?).
> Multiple client threads report from RegionMoved exception processing logic "R moved from C to B", even though such transition never happened (neither in nor before the sequence described below). Not quite sure how the client learned of the transition to C, I assume it's from meta from some other thread...
> Then, put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding).
> I have a patch but not sure if it works, test still fails locally for yet unknown reason.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7268) correct local region location cache information can be overwritten w/stale information from an old server

Posted by "Enis Soztutar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510920#comment-13510920 ] 

Enis Soztutar commented on HBASE-7268:
--------------------------------------

bq. I'd prefer to have TS in meta but this is a simpler fix for now.
Can't we use the TS from the META Puts/Gets to not to invalidate the client cache? 
                
> correct local region location cache information can be overwritten w/stale information from an old server
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7268
>                 URL: https://issues.apache.org/jira/browse/HBASE-7268
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.96.0
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Minor
>
> Discovered via HBASE-7250; related to HBASE-5877.
> Test is writing from multiple threads.
> Server A has region R; client knows that.
> R gets moved from A to server B.
> B gets killed.
> R gets moved by master to server C.
> ~15 seconds later, client tries to write to it (on A?).
> Multiple client threads report from RegionMoved exception processing logic "R moved from C to B", even though such transition never happened (neither in nor before the sequence described below). Not quite sure how the client learned of the transition to C, I assume it's from meta from some other thread...
> Then, put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding).
> I have a patch but not sure if it works, test still fails locally for yet unknown reason.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7268) correct local region location cache information can be overwritten w/stale information from an old server

Posted by "Sergey Shelukhin (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sergey Shelukhin updated HBASE-7268:
------------------------------------

    Description: 
Discovered via HBASE-7250; related to HBASE-5877.
Test is writing from multiple threads.
Server A has region R; client knows that.
R gets moved from A to server B.
B gets killed.
R gets moved by master to server C.
~15 seconds later, client tries to write to it (on A?).
Multiple client threads report from RegionMoved exception processing logic "R moved from C to B", even though such transition never happened (neither in nor before the sequence described below). Not quite sure how the client learned of the transition to C, I assume it's from meta from some other thread...
Then, put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding).

I have a patch but not sure if it works, test still fails locally for yet unknown reason.

  was:
Discovered via HBASE-7250; related to HBASE-5877.
Test is writing from multiple threads.
Server A has region R; client knows that.
R gets moved from A to server B.
B gets killed.
R gets moved by master to server C.
~15 seconds later, client tries to write to it (on A?).
Multiple client threads report from RegionMoved exception processing logic "R moved from C to B", even though such transition never happened (neither in nor before the sequence described below), then put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding).

I have a patch but not sure if it works, test still fails locally for yet unknown reason.

    
> correct local region location cache information can be overwritten w/stale information from an old server
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7268
>                 URL: https://issues.apache.org/jira/browse/HBASE-7268
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.96.0
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Minor
>
> Discovered via HBASE-7250; related to HBASE-5877.
> Test is writing from multiple threads.
> Server A has region R; client knows that.
> R gets moved from A to server B.
> B gets killed.
> R gets moved by master to server C.
> ~15 seconds later, client tries to write to it (on A?).
> Multiple client threads report from RegionMoved exception processing logic "R moved from C to B", even though such transition never happened (neither in nor before the sequence described below). Not quite sure how the client learned of the transition to C, I assume it's from meta from some other thread...
> Then, put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding).
> I have a patch but not sure if it works, test still fails locally for yet unknown reason.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7268) correct local region location cache information can be overwritten w/stale information from an old server

Posted by "Sergey Shelukhin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509969#comment-13509969 ] 

Sergey Shelukhin commented on HBASE-7268:
-----------------------------------------

It's on minicluster; so, same machine. Do you have more information on the problem you had/can it happen w/o time difference?

Thanks.
                
> correct local region location cache information can be overwritten w/stale information from an old server
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7268
>                 URL: https://issues.apache.org/jira/browse/HBASE-7268
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.96.0
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Minor
>
> Discovered via HBASE-7250; related to HBASE-5877.
> Test is writing from multiple threads.
> Server A has region R; client knows that.
> R gets moved from A to server B.
> B gets killed.
> R gets moved by master to server C.
> ~15 seconds later, client tries to write to it (on A?).
> Multiple client threads report from RegionMoved exception processing logic "R moved from C to B", even though such transition never happened (neither in nor before the sequence described below). Not quite sure how the client learned of the transition to C, I assume it's from meta from some other thread...
> Then, put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding).
> I have a patch but not sure if it works, test still fails locally for yet unknown reason.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7268) correct local region location cache information can be overwritten w/stale information from an old server

Posted by "Sergey Shelukhin (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sergey Shelukhin updated HBASE-7268:
------------------------------------

    Attachment: HBASE-7268-v0.patch
    
> correct local region location cache information can be overwritten w/stale information from an old server
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7268
>                 URL: https://issues.apache.org/jira/browse/HBASE-7268
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.96.0
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Minor
>         Attachments: HBASE-7268-v0.patch, HBASE-7268-v0.patch
>
>
> Discovered via HBASE-7250; related to HBASE-5877.
> Test is writing from multiple threads.
> Server A has region R; client knows that.
> R gets moved from A to server B.
> B gets killed.
> R gets moved by master to server C.
> ~15 seconds later, client tries to write to it (on A?).
> Multiple client threads report from RegionMoved exception processing logic "R moved from C to B", even though such transition never happened (neither in nor before the sequence described below). Not quite sure how the client learned of the transition to C, I assume it's from meta from some other thread...
> Then, put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding).
> I have a patch but not sure if it works, test still fails locally for yet unknown reason.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7268) correct local region location cache information can be overwritten w/stale information from an old server

Posted by "Sergey Shelukhin (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sergey Shelukhin updated HBASE-7268:
------------------------------------

    Status: Patch Available  (was: Open)
    
> correct local region location cache information can be overwritten w/stale information from an old server
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7268
>                 URL: https://issues.apache.org/jira/browse/HBASE-7268
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.96.0
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Minor
>         Attachments: HBASE-7268-v0.patch
>
>
> Discovered via HBASE-7250; related to HBASE-5877.
> Test is writing from multiple threads.
> Server A has region R; client knows that.
> R gets moved from A to server B.
> B gets killed.
> R gets moved by master to server C.
> ~15 seconds later, client tries to write to it (on A?).
> Multiple client threads report from RegionMoved exception processing logic "R moved from C to B", even though such transition never happened (neither in nor before the sequence described below). Not quite sure how the client learned of the transition to C, I assume it's from meta from some other thread...
> Then, put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding).
> I have a patch but not sure if it works, test still fails locally for yet unknown reason.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7268) correct local region location cache information can be overwritten w/stale information from an old server

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509977#comment-13509977 ] 

nkeywal commented on HBASE-7268:
--------------------------------

I think this is for 0.96 only: there is an heuristic when a region move: we send back the new location to the client, to save a call to meta. There are multiple cases on which this information will be stale: the region moved again, the server is dead, ... That's the limit with heuristics :-).
                
> correct local region location cache information can be overwritten w/stale information from an old server
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7268
>                 URL: https://issues.apache.org/jira/browse/HBASE-7268
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.96.0
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Minor
>
> Discovered via HBASE-7250; related to HBASE-5877.
> Test is writing from multiple threads.
> Server A has region R; client knows that.
> R gets moved from A to server B.
> B gets killed.
> R gets moved by master to server C.
> ~15 seconds later, client tries to write to it (on A?).
> Multiple client threads report from RegionMoved exception processing logic "R moved from C to B", even though such transition never happened (neither in nor before the sequence described below). Not quite sure how the client learned of the transition to C, I assume it's from meta from some other thread...
> Then, put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding).
> I have a patch but not sure if it works, test still fails locally for yet unknown reason.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7268) correct local region location cache information can be overwritten w/stale information from an old server

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509489#comment-13509489 ] 

ramkrishna.s.vasudevan commented on HBASE-7268:
-----------------------------------------------

Could you see the meta updation part?  Is there a time delay between machines?  Just a guess.  Once a similar prob we faced.
                
> correct local region location cache information can be overwritten w/stale information from an old server
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7268
>                 URL: https://issues.apache.org/jira/browse/HBASE-7268
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.96.0
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Minor
>
> Discovered via HBASE-7250; related to HBASE-5877.
> Test is writing from multiple threads.
> Server A has region R; client knows that.
> R gets moved from A to server B.
> B gets killed.
> R gets moved by master to server C.
> ~15 seconds later, client tries to write to it (on A?).
> Multiple client threads report from RegionMoved exception processing logic "R moved from C to B", even though such transition never happened (neither in nor before the sequence described below). Not quite sure how the client learned of the transition to C, I assume it's from meta from some other thread...
> Then, put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding).
> I have a patch but not sure if it works, test still fails locally for yet unknown reason.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7268) correct local region location cache information can be overwritten w/stale information from an old server

Posted by "Sergey Shelukhin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510093#comment-13510093 ] 

Sergey Shelukhin commented on HBASE-7268:
-----------------------------------------

I was thinking of adding timestamp to HRegionLocation; ideally it would be global sequence. I made a patch where master provides it both via open request to write to meta, and when closing to put into redirection response; however this is way too much changes (and meta change) for such small issue. I have a patch where local TS is used in both cases (and stored in HRegionLocation), I'll add it after I can figure out other issues with HBASE-7250
                
> correct local region location cache information can be overwritten w/stale information from an old server
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7268
>                 URL: https://issues.apache.org/jira/browse/HBASE-7268
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.96.0
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Minor
>
> Discovered via HBASE-7250; related to HBASE-5877.
> Test is writing from multiple threads.
> Server A has region R; client knows that.
> R gets moved from A to server B.
> B gets killed.
> R gets moved by master to server C.
> ~15 seconds later, client tries to write to it (on A?).
> Multiple client threads report from RegionMoved exception processing logic "R moved from C to B", even though such transition never happened (neither in nor before the sequence described below). Not quite sure how the client learned of the transition to C, I assume it's from meta from some other thread...
> Then, put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding).
> I have a patch but not sure if it works, test still fails locally for yet unknown reason.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira