You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Ted Yu (JIRA)" <ji...@apache.org> on 2013/01/23 19:59:13 UTC

[jira] [Commented] (HBASE-7268) correct local region location cache information can be overwritten (or deleted) w/stale information from an old server

    [ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560950#comment-13560950 ] 

Ted Yu commented on HBASE-7268:
-------------------------------

>From https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/366/testReport/junit/org.apache.hadoop.hbase.util/TestMiniClusterLoadSequential/loadTest_0_/, I found:
{code}
2013-01-22 03:16:55,763 ERROR [HBaseWriterThread_6] server.NIOServerCnxnFactory$1(44): Thread Thread[HBaseWriterThread_6,5,main] died
java.lang.NullPointerException
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.deleteCachedLocation(HConnectionManager.java:1783)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.updateCachedLocations(HConnectionManager.java:1825)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.access$1300(HConnectionManager.java:515)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$Process.processBatchCallback(HConnectionManager.java:2035)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$Process.access$900(HConnectionManager.java:1874)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1863)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1842)
	at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:882)
	at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:692)
	at org.apache.hadoop.hbase.client.HTable.put(HTable.java:667)
	at org.apache.hadoop.hbase.util.MultiThreadedWriter$HBaseWriterThread.insert(MultiThreadedWriter.java:175)
	at org.apache.hadoop.hbase.util.MultiThreadedWriter$HBaseWriterThread.run(MultiThreadedWriter.java:145)
{code}
Looks like oldLocation was null in the following check:
{code}
        isStaleDelete = (source != null) && !oldLocation.equals(source);
{code}
Can you include the fix in the addendum ?
                
> correct local region location cache information can be overwritten (or deleted) w/stale information from an old server
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7268
>                 URL: https://issues.apache.org/jira/browse/HBASE-7268
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.96.0
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 7268-v6.patch, 7268-v8.patch, HBASE-7268-v0.patch, HBASE-7268-v0.patch, HBASE-7268-v1.patch, HBASE-7268-v2.patch, HBASE-7268-v2-plus-masterTs.patch, HBASE-7268-v2-plus-masterTs.patch, HBASE-7268-v3.patch, HBASE-7268-v4.patch, HBASE-7268-v5.patch, HBASE-7268-v6.patch, HBASE-7268-v7.patch, HBASE-7268-v8.patch, HBASE-7268-v9.patch
>
>
> Discovered via HBASE-7250; related to HBASE-5877.
> Test is writing from multiple threads.
> Server A has region R; client knows that.
> R gets moved from A to server B.
> B gets killed.
> R gets moved by master to server C.
> ~15 seconds later, client tries to write to it (on A?).
> Multiple client threads report from RegionMoved exception processing logic "R moved from C to B", even though such transition never happened (neither in nor before the sequence described below). Not quite sure how the client learned of the transition to C, I assume it's from meta from some other thread...
> Then, put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding).
> I have a patch but not sure if it works, test still fails locally for yet unknown reason.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira