You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Sergey Shelukhin (JIRA)" <ji...@apache.org> on 2012/12/04 04:25:58 UTC
[jira] [Created] (HBASE-7268) correct local region location cache
information can be overwritten w/stale information from an old server
Sergey Shelukhin created HBASE-7268:
---------------------------------------
Summary: correct local region location cache information can be overwritten w/stale information from an old server
Key: HBASE-7268
URL: https://issues.apache.org/jira/browse/HBASE-7268
Project: HBase
Issue Type: Bug
Affects Versions: 0.96.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Minor
Discovered via HBASE-7250; related to HBASE-5877.
Test is writing from multiple threads.
Server A has region R.
R gets moved from A to server B.
Server B gets killed.
Region gets moved by master to server C.
~15 seconds later, client tries to write to it.
Multiple client threads report "region moved from C to B", even though such transition never happened (neither in nor before the sequence described below).
I have a patch but not sure if it works, test still fails locally for yet unknown reason.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7268) correct local region location cache
information can be overwritten w/stale information from an old server
Posted by "Sergey Shelukhin (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510798#comment-13510798 ]
Sergey Shelukhin commented on HBASE-7268:
-----------------------------------------
ok, I got repro... will attach patch after cleanup of debug logging/etc.
I'd prefer to have TS in meta but this is a simpler fix for now.
The logging with patch looks like this:
{code}
2012-12-05 12:06:08,285 DEBUG [Thread-521] util.ChaosMonkey$Action(203): Removing 13 regions from 10.10.11.17,53406,1354737903944
...
2012-12-05 12:06:08,765 INFO [am-zkevent-worker-pool-2-thread-2] master.RegionStates(249): Region {NAME => 'IntegrationTestRebalanceAndKillServersTargeted,7ffffff8,1354737916774.89483778064d05b1f2e1c0d20bcabc16.', STARTKEY => '7ffffff8', ENDKEY => '8cccccc4', ENCODED => 89483778064d05b1f2e1c0d20bcabc16,} transitioned from {IntegrationTestRebalanceAndKillServersTargeted,7ffffff8,1354737916774.89483778064d05b1f2e1c0d20bcabc16. state=PENDING_OPEN, ts=1354737968742, server=10.10.11.17,53407,1354737903960} to {IntegrationTestRebalanceAndKillServersTargeted,7ffffff8,1354737916774.89483778064d05b1f2e1c0d20bcabc16. state=OPENING, ts=1354737968765, server=10.10.11.17,53407,1354737903960}
...
2012-12-05 12:06:10,549 INFO [Thread-521] util.ChaosMonkey$Action(179): Killing region server:10.10.11.17,53407,1354737903960
...
2012-12-05 12:06:39,233 INFO [am-zkevent-worker-pool-2-thread-2] master.RegionStates(249): Region {NAME => 'IntegrationTestRebalanceAndKillServersTargeted,7ffffff8,1354737916774.89483778064d05b1f2e1c0d20bcabc16.', STARTKEY => '7ffffff8', ENDKEY => '8cccccc4', ENCODED => 89483778064d05b1f2e1c0d20bcabc16,} transitioned from {IntegrationTestRebalanceAndKillServersTargeted,7ffffff8,1354737916774.89483778064d05b1f2e1c0d20bcabc16. state=OPENING, ts=1354737999228, server=10.10.11.17,53404,1354737903902} to {IntegrationTestRebalanceAndKillServersTargeted,7ffffff8,1354737916774.89483778064d05b1f2e1c0d20bcabc16. state=OPEN, ts=1354737999232, server=10.10.11.17,53404,1354737903902}
...
2012-12-05 12:06:40,276 INFO [HBaseWriterThread_4] client.HConnectionManager$HConnectionImplementation(1776): Received an error from 10.10.11.17:53407 for region IntegrationTestRebalanceAndKillServersTargeted,7ffffff8,1354737916774.89483778064d05b1f2e1c0d20bcabc16.; not removing 10.10.11.17:53404 from cache.
...
2012-12-05 12:06:40,381 INFO [HBaseWriterThread_15] client.HConnectionManager$HConnectionImplementation(1809): Region IntegrationTestRebalanceAndKillServersTargeted,7ffffff8,1354737916774.89483778064d05b1f2e1c0d20bcabc16. moved to 10.10.11.17:53407 according to 10.10.11.17:53406
2012-12-05 12:06:40,381 DEBUG [HBaseWriterThread_15] client.HConnectionManager$HConnectionImplementation(1342): Ignoring stale location update for IntegrationTestRebalanceAndKillServersTargeted,7ffffff8,1354737916774.89483778064d05b1f2e1c0d20bcabc16.: 10.10.11.17:53407 at 1354737968725; local 10.10.11.17:53404 at 1354738000265
{code}
> correct local region location cache information can be overwritten w/stale information from an old server
> ---------------------------------------------------------------------------------------------------------
>
> Key: HBASE-7268
> URL: https://issues.apache.org/jira/browse/HBASE-7268
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.96.0
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Priority: Minor
>
> Discovered via HBASE-7250; related to HBASE-5877.
> Test is writing from multiple threads.
> Server A has region R; client knows that.
> R gets moved from A to server B.
> B gets killed.
> R gets moved by master to server C.
> ~15 seconds later, client tries to write to it (on A?).
> Multiple client threads report from RegionMoved exception processing logic "R moved from C to B", even though such transition never happened (neither in nor before the sequence described below). Not quite sure how the client learned of the transition to C, I assume it's from meta from some other thread...
> Then, put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding).
> I have a patch but not sure if it works, test still fails locally for yet unknown reason.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7268) correct local region location cache
information can be overwritten w/stale information from an old server
Posted by "Sergey Shelukhin (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sergey Shelukhin updated HBASE-7268:
------------------------------------
Description:
Discovered via HBASE-7250; related to HBASE-5877.
Test is writing from multiple threads.
Server A has region R; client knows that.
R gets moved from A to server B.
B gets killed.
R gets moved by master to server C.
~15 seconds later, client tries to write to it (on A?).
Multiple client threads report from RegionMoved exception processing logic "R moved from C to B", even though such transition never happened (neither in nor before the sequence described below), then put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding).
I have a patch but not sure if it works, test still fails locally for yet unknown reason.
was:
Discovered via HBASE-7250; related to HBASE-5877.
Test is writing from multiple threads.
Server A has region R.
R gets moved from A to server B.
Server B gets killed.
Region gets moved by master to server C.
~15 seconds later, client tries to write to it.
Multiple client threads report "region moved from C to B", even though such transition never happened (neither in nor before the sequence described below).
I have a patch but not sure if it works, test still fails locally for yet unknown reason.
> correct local region location cache information can be overwritten w/stale information from an old server
> ---------------------------------------------------------------------------------------------------------
>
> Key: HBASE-7268
> URL: https://issues.apache.org/jira/browse/HBASE-7268
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.96.0
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Priority: Minor
>
> Discovered via HBASE-7250; related to HBASE-5877.
> Test is writing from multiple threads.
> Server A has region R; client knows that.
> R gets moved from A to server B.
> B gets killed.
> R gets moved by master to server C.
> ~15 seconds later, client tries to write to it (on A?).
> Multiple client threads report from RegionMoved exception processing logic "R moved from C to B", even though such transition never happened (neither in nor before the sequence described below), then put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding).
> I have a patch but not sure if it works, test still fails locally for yet unknown reason.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7268) correct local region location cache
information can be overwritten w/stale information from an old server
Posted by "Sergey Shelukhin (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sergey Shelukhin updated HBASE-7268:
------------------------------------
Attachment: HBASE-7268-v0.patch
Apparently meta TS is set. Using it. Also made some logging more helpful.
> correct local region location cache information can be overwritten w/stale information from an old server
> ---------------------------------------------------------------------------------------------------------
>
> Key: HBASE-7268
> URL: https://issues.apache.org/jira/browse/HBASE-7268
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.96.0
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Priority: Minor
> Attachments: HBASE-7268-v0.patch
>
>
> Discovered via HBASE-7250; related to HBASE-5877.
> Test is writing from multiple threads.
> Server A has region R; client knows that.
> R gets moved from A to server B.
> B gets killed.
> R gets moved by master to server C.
> ~15 seconds later, client tries to write to it (on A?).
> Multiple client threads report from RegionMoved exception processing logic "R moved from C to B", even though such transition never happened (neither in nor before the sequence described below). Not quite sure how the client learned of the transition to C, I assume it's from meta from some other thread...
> Then, put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding).
> I have a patch but not sure if it works, test still fails locally for yet unknown reason.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7268) correct local region location cache
information can be overwritten w/stale information from an old server
Posted by "Enis Soztutar (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510920#comment-13510920 ]
Enis Soztutar commented on HBASE-7268:
--------------------------------------
bq. I'd prefer to have TS in meta but this is a simpler fix for now.
Can't we use the TS from the META Puts/Gets to not to invalidate the client cache?
> correct local region location cache information can be overwritten w/stale information from an old server
> ---------------------------------------------------------------------------------------------------------
>
> Key: HBASE-7268
> URL: https://issues.apache.org/jira/browse/HBASE-7268
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.96.0
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Priority: Minor
>
> Discovered via HBASE-7250; related to HBASE-5877.
> Test is writing from multiple threads.
> Server A has region R; client knows that.
> R gets moved from A to server B.
> B gets killed.
> R gets moved by master to server C.
> ~15 seconds later, client tries to write to it (on A?).
> Multiple client threads report from RegionMoved exception processing logic "R moved from C to B", even though such transition never happened (neither in nor before the sequence described below). Not quite sure how the client learned of the transition to C, I assume it's from meta from some other thread...
> Then, put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding).
> I have a patch but not sure if it works, test still fails locally for yet unknown reason.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7268) correct local region location cache
information can be overwritten w/stale information from an old server
Posted by "Sergey Shelukhin (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sergey Shelukhin updated HBASE-7268:
------------------------------------
Description:
Discovered via HBASE-7250; related to HBASE-5877.
Test is writing from multiple threads.
Server A has region R; client knows that.
R gets moved from A to server B.
B gets killed.
R gets moved by master to server C.
~15 seconds later, client tries to write to it (on A?).
Multiple client threads report from RegionMoved exception processing logic "R moved from C to B", even though such transition never happened (neither in nor before the sequence described below). Not quite sure how the client learned of the transition to C, I assume it's from meta from some other thread...
Then, put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding).
I have a patch but not sure if it works, test still fails locally for yet unknown reason.
was:
Discovered via HBASE-7250; related to HBASE-5877.
Test is writing from multiple threads.
Server A has region R; client knows that.
R gets moved from A to server B.
B gets killed.
R gets moved by master to server C.
~15 seconds later, client tries to write to it (on A?).
Multiple client threads report from RegionMoved exception processing logic "R moved from C to B", even though such transition never happened (neither in nor before the sequence described below), then put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding).
I have a patch but not sure if it works, test still fails locally for yet unknown reason.
> correct local region location cache information can be overwritten w/stale information from an old server
> ---------------------------------------------------------------------------------------------------------
>
> Key: HBASE-7268
> URL: https://issues.apache.org/jira/browse/HBASE-7268
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.96.0
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Priority: Minor
>
> Discovered via HBASE-7250; related to HBASE-5877.
> Test is writing from multiple threads.
> Server A has region R; client knows that.
> R gets moved from A to server B.
> B gets killed.
> R gets moved by master to server C.
> ~15 seconds later, client tries to write to it (on A?).
> Multiple client threads report from RegionMoved exception processing logic "R moved from C to B", even though such transition never happened (neither in nor before the sequence described below). Not quite sure how the client learned of the transition to C, I assume it's from meta from some other thread...
> Then, put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding).
> I have a patch but not sure if it works, test still fails locally for yet unknown reason.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7268) correct local region location cache
information can be overwritten w/stale information from an old server
Posted by "Sergey Shelukhin (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509969#comment-13509969 ]
Sergey Shelukhin commented on HBASE-7268:
-----------------------------------------
It's on minicluster; so, same machine. Do you have more information on the problem you had/can it happen w/o time difference?
Thanks.
> correct local region location cache information can be overwritten w/stale information from an old server
> ---------------------------------------------------------------------------------------------------------
>
> Key: HBASE-7268
> URL: https://issues.apache.org/jira/browse/HBASE-7268
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.96.0
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Priority: Minor
>
> Discovered via HBASE-7250; related to HBASE-5877.
> Test is writing from multiple threads.
> Server A has region R; client knows that.
> R gets moved from A to server B.
> B gets killed.
> R gets moved by master to server C.
> ~15 seconds later, client tries to write to it (on A?).
> Multiple client threads report from RegionMoved exception processing logic "R moved from C to B", even though such transition never happened (neither in nor before the sequence described below). Not quite sure how the client learned of the transition to C, I assume it's from meta from some other thread...
> Then, put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding).
> I have a patch but not sure if it works, test still fails locally for yet unknown reason.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7268) correct local region location cache
information can be overwritten w/stale information from an old server
Posted by "Sergey Shelukhin (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sergey Shelukhin updated HBASE-7268:
------------------------------------
Attachment: HBASE-7268-v0.patch
> correct local region location cache information can be overwritten w/stale information from an old server
> ---------------------------------------------------------------------------------------------------------
>
> Key: HBASE-7268
> URL: https://issues.apache.org/jira/browse/HBASE-7268
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.96.0
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Priority: Minor
> Attachments: HBASE-7268-v0.patch, HBASE-7268-v0.patch
>
>
> Discovered via HBASE-7250; related to HBASE-5877.
> Test is writing from multiple threads.
> Server A has region R; client knows that.
> R gets moved from A to server B.
> B gets killed.
> R gets moved by master to server C.
> ~15 seconds later, client tries to write to it (on A?).
> Multiple client threads report from RegionMoved exception processing logic "R moved from C to B", even though such transition never happened (neither in nor before the sequence described below). Not quite sure how the client learned of the transition to C, I assume it's from meta from some other thread...
> Then, put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding).
> I have a patch but not sure if it works, test still fails locally for yet unknown reason.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7268) correct local region location cache
information can be overwritten w/stale information from an old server
Posted by "Sergey Shelukhin (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sergey Shelukhin updated HBASE-7268:
------------------------------------
Status: Patch Available (was: Open)
> correct local region location cache information can be overwritten w/stale information from an old server
> ---------------------------------------------------------------------------------------------------------
>
> Key: HBASE-7268
> URL: https://issues.apache.org/jira/browse/HBASE-7268
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.96.0
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Priority: Minor
> Attachments: HBASE-7268-v0.patch
>
>
> Discovered via HBASE-7250; related to HBASE-5877.
> Test is writing from multiple threads.
> Server A has region R; client knows that.
> R gets moved from A to server B.
> B gets killed.
> R gets moved by master to server C.
> ~15 seconds later, client tries to write to it (on A?).
> Multiple client threads report from RegionMoved exception processing logic "R moved from C to B", even though such transition never happened (neither in nor before the sequence described below). Not quite sure how the client learned of the transition to C, I assume it's from meta from some other thread...
> Then, put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding).
> I have a patch but not sure if it works, test still fails locally for yet unknown reason.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7268) correct local region location cache
information can be overwritten w/stale information from an old server
Posted by "nkeywal (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509977#comment-13509977 ]
nkeywal commented on HBASE-7268:
--------------------------------
I think this is for 0.96 only: there is an heuristic when a region move: we send back the new location to the client, to save a call to meta. There are multiple cases on which this information will be stale: the region moved again, the server is dead, ... That's the limit with heuristics :-).
> correct local region location cache information can be overwritten w/stale information from an old server
> ---------------------------------------------------------------------------------------------------------
>
> Key: HBASE-7268
> URL: https://issues.apache.org/jira/browse/HBASE-7268
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.96.0
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Priority: Minor
>
> Discovered via HBASE-7250; related to HBASE-5877.
> Test is writing from multiple threads.
> Server A has region R; client knows that.
> R gets moved from A to server B.
> B gets killed.
> R gets moved by master to server C.
> ~15 seconds later, client tries to write to it (on A?).
> Multiple client threads report from RegionMoved exception processing logic "R moved from C to B", even though such transition never happened (neither in nor before the sequence described below). Not quite sure how the client learned of the transition to C, I assume it's from meta from some other thread...
> Then, put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding).
> I have a patch but not sure if it works, test still fails locally for yet unknown reason.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7268) correct local region location cache
information can be overwritten w/stale information from an old server
Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509489#comment-13509489 ]
ramkrishna.s.vasudevan commented on HBASE-7268:
-----------------------------------------------
Could you see the meta updation part? Is there a time delay between machines? Just a guess. Once a similar prob we faced.
> correct local region location cache information can be overwritten w/stale information from an old server
> ---------------------------------------------------------------------------------------------------------
>
> Key: HBASE-7268
> URL: https://issues.apache.org/jira/browse/HBASE-7268
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.96.0
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Priority: Minor
>
> Discovered via HBASE-7250; related to HBASE-5877.
> Test is writing from multiple threads.
> Server A has region R; client knows that.
> R gets moved from A to server B.
> B gets killed.
> R gets moved by master to server C.
> ~15 seconds later, client tries to write to it (on A?).
> Multiple client threads report from RegionMoved exception processing logic "R moved from C to B", even though such transition never happened (neither in nor before the sequence described below). Not quite sure how the client learned of the transition to C, I assume it's from meta from some other thread...
> Then, put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding).
> I have a patch but not sure if it works, test still fails locally for yet unknown reason.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7268) correct local region location cache
information can be overwritten w/stale information from an old server
Posted by "Sergey Shelukhin (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510093#comment-13510093 ]
Sergey Shelukhin commented on HBASE-7268:
-----------------------------------------
I was thinking of adding timestamp to HRegionLocation; ideally it would be global sequence. I made a patch where master provides it both via open request to write to meta, and when closing to put into redirection response; however this is way too much changes (and meta change) for such small issue. I have a patch where local TS is used in both cases (and stored in HRegionLocation), I'll add it after I can figure out other issues with HBASE-7250
> correct local region location cache information can be overwritten w/stale information from an old server
> ---------------------------------------------------------------------------------------------------------
>
> Key: HBASE-7268
> URL: https://issues.apache.org/jira/browse/HBASE-7268
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.96.0
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Priority: Minor
>
> Discovered via HBASE-7250; related to HBASE-5877.
> Test is writing from multiple threads.
> Server A has region R; client knows that.
> R gets moved from A to server B.
> B gets killed.
> R gets moved by master to server C.
> ~15 seconds later, client tries to write to it (on A?).
> Multiple client threads report from RegionMoved exception processing logic "R moved from C to B", even though such transition never happened (neither in nor before the sequence described below). Not quite sure how the client learned of the transition to C, I assume it's from meta from some other thread...
> Then, put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding).
> I have a patch but not sure if it works, test still fails locally for yet unknown reason.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira