You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Matteo Bertozzi (JIRA)" <ji...@apache.org> on 2013/06/21 15:10:20 UTC

[jira] [Created] (HBASE-8783) RSSnapshotManager.ZKProcedureMemberRpcs may be initialized with the wrong server name

Matteo Bertozzi created HBASE-8783:
--------------------------------------

             Summary: RSSnapshotManager.ZKProcedureMemberRpcs may be initialized with the wrong server name
                 Key: HBASE-8783
                 URL: https://issues.apache.org/jira/browse/HBASE-8783
             Project: HBase
          Issue Type: Bug
          Components: snapshots
    Affects Versions: 0.95.1, 0.94.8
            Reporter: Matteo Bertozzi
            Assignee: Matteo Bertozzi
            Priority: Minor
             Fix For: 0.95.2, 0.94.9
         Attachments: HBASE-8783-0.94-v0.patch

The ZKProcedureMemberRpcs of the RegionServerSnapshotManager may be initialized with the wrong memberName.

{code}
2013-06-21 05:03:41,732 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: Initialize Snapshot Manager
...
2013-06-21 05:03:41,875 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Master passed us hostname to use. Was=0.0.0.0, Now=srv-5.test.cloudera.com
{code}

The Region Server Name is used as memberName, but since the snapshot manger is initialized before the RS receives the server name used by the master, the zkprocedure will use the wrong name (0.0.0.0). 
This will case the snapshot to fail with a TimeoutException since the master will not receive the expected RS
{code}
Master:
ZKProcedureCoordinatorRpcs: Watching for acquire node:/hbase/online-snapshot/acquired/foo23/srv-5.test.cloudera.com,60020,1371813451915

RS:
ZKProcedureMemberRpcs: Member: '0.0.0.0,60020,1371814996779' joining acquired barrier for procedure (foo23) in zk

...
org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed! Source:Timeout caused Foreign Exception Start:1371798732141, End:1371798792141, diff:60000, max:60000 ms
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira