You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2011/01/08 02:39:45 UTC

[jira] Created: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

Regionserver is not using the name given it by the master; double entry in master listing of servers
----------------------------------------------------------------------------------------------------

                 Key: HBASE-3431
                 URL: https://issues.apache.org/jira/browse/HBASE-3431
             Project: HBase
          Issue Type: Bug
    Affects Versions: 0.90.0
            Reporter: stack


Our man Ted Dunning found the following where RS checks in with one name, the master tells it use another name but we seem to go ahead and continue with our original name.

In RS logs I see:

{code}
2011-01-07 15:45:50,757 INFO  org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
{code}


On master I see

{code}
2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 0 on 60000]: Registering server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
{code}

....

then later

{code}
2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 2 on 60000]: Registering server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
{code}

This might be since we started letting servers register in other than with the reportStartup.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Commented] (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030215#comment-13030215 ] 

Hudson commented on HBASE-3431:
-------------------------------

Integrated in HBase-TRUNK #1909 (See [https://builds.apache.org/hudson/job/HBase-TRUNK/1909/])
    

> Regionserver is not using the name given it by the master; double entry in master listing of servers
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3431
>                 URL: https://issues.apache.org/jira/browse/HBASE-3431
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: 3431-v2.txt, 3431-v3.txt, 3431-v3.txt, 3431-v4.txt, 3431.txt
>
>
> Our man Ted Dunning found the following where RS checks in with one name, the master tells it use another name but we seem to go ahead and continue with our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 0 on 60000]: Registering server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> ....
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 2 on 60000]: Registering server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with the reportStartup.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990852#comment-12990852 ] 

stack commented on HBASE-3431:
------------------------------

If master can't find regionserver address, then master does this:

{code}
Caused by: java.lang.IllegalArgumentException: Could not resolve the DNS name of sv2borg185:60020
    at org.apache.hadoop.hbase.HServerAddress.checkBindAddressCanBeResolved(HServerAddress.java:105)
    at org.apache.hadoop.hbase.HServerAddress.readFields(HServerAddress.java:168)
    at org.apache.hadoop.hbase.HServerInfo.readFields(HServerInfo.java:230)
    at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:521)
    ... 8 more
{code}

... which is kinda dumb but means no progress unless server can get an address.

If DNS is wrong, e.g. on master, when it does a lookup on passed name, we come up w/ a different address, then we'll tell the regionserver go forward with the IP.

At moment you'll see two entries for this badly configured server.  The regionserver will show by its name and by its bad IP.

Symptom is you can't shutdown because master is waiting on the ghost server to finish its close up (this is what was happening for mr oracle.com).

I manufactured Ted's prob. by changing hosts on master to have different subnet for a server.  Then I got this in RS log:

{code}
2011-02-05 00:33:49,409 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Master passed us address to use. Was=sv2borg185:60020, Now=10.20.20.185:60020
{code}

Let me dig in.



> Regionserver is not using the name given it by the master; double entry in master listing of servers
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3431
>                 URL: https://issues.apache.org/jira/browse/HBASE-3431
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.90.1
>
>         Attachments: 3431.txt
>
>
> Our man Ted Dunning found the following where RS checks in with one name, the master tells it use another name but we seem to go ahead and continue with our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 0 on 60000]: Registering server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> ....
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 2 on 60000]: Registering server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with the reportStartup.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3431:
-------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Resolving.  I added to FAQ section on what to do if you are seeing double (the regionservers).  Resolving also because hbase-1502 changes how this all works so hopefully we don't see this type of issue anymore.

> Regionserver is not using the name given it by the master; double entry in master listing of servers
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3431
>                 URL: https://issues.apache.org/jira/browse/HBASE-3431
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: 3431-v2.txt, 3431-v3.txt, 3431-v3.txt, 3431-v4.txt, 3431.txt
>
>
> Our man Ted Dunning found the following where RS checks in with one name, the master tells it use another name but we seem to go ahead and continue with our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 0 on 60000]: Registering server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> ....
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 2 on 60000]: Registering server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with the reportStartup.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3431:
-------------------------

    Fix Version/s:     (was: 0.90.1)
                   0.92.0

Moving out.  J-D won't let me get away w/ my radical strip back of code; i'll break stuff thats currently 'working' (probably true).  I can't change the RPC because we're on a point version.  I can't change the way HSI and HSA work, because we are on a point version.  The only invariant in the communication between Master and RS is the server startcode (and port).  I could try and leverage this fact but looks like a bunch of work -- and this stuff is all going away in 0.92.

Let me add to the FAQ in the book what to do if user has double-vision.  That'll do for 0.90.1.

> Regionserver is not using the name given it by the master; double entry in master listing of servers
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3431
>                 URL: https://issues.apache.org/jira/browse/HBASE-3431
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: 3431-v2.txt, 3431-v3.txt, 3431-v3.txt, 3431-v4.txt, 3431.txt
>
>
> Our man Ted Dunning found the following where RS checks in with one name, the master tells it use another name but we seem to go ahead and continue with our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 0 on 60000]: Registering server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> ....
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 2 on 60000]: Registering server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with the reportStartup.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991094#comment-12991094 ] 

stack commented on HBASE-3431:
------------------------------

If RS passes 127.0.0.1, then thats what its bound too and no (remote) client will be able to connect.  Its broke.

The fixup in master would let this (broke) server successfully register.  The master would call remoteIP on the connected socket to get the RSs' address and it would then know the RS as this.  This would happen only on startup, in reportForDuty, not subsequently during heartbeating; we only do the lookup of remoteip on reportForDuty.

Heartbeating, the RS was supposed to be volunteering the HServerInfo that the Master had passed it back as response to the reportForDuty.

Since 0.90.0, servers can register at heartbeat time.  This is because masters can join an already running cluster.  The RSs do not rerun the reportForDuty step.  They just start heartbeating the new Master.

We could I suppose add lookup on the sockets remoteip to heartbeating too with reverse lookup.

I'm thinking its better to just strip all this crap out.

> Regionserver is not using the name given it by the master; double entry in master listing of servers
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3431
>                 URL: https://issues.apache.org/jira/browse/HBASE-3431
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.90.1
>
>         Attachments: 3431-v2.txt, 3431-v3.txt, 3431-v3.txt, 3431.txt
>
>
> Our man Ted Dunning found the following where RS checks in with one name, the master tells it use another name but we seem to go ahead and continue with our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 0 on 60000]: Registering server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> ....
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 2 on 60000]: Registering server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with the reportStartup.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991217#comment-12991217 ] 

ryan rawson commented on HBASE-3431:
------------------------------------

I'll have a look monday


> Regionserver is not using the name given it by the master; double entry in master listing of servers
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3431
>                 URL: https://issues.apache.org/jira/browse/HBASE-3431
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.90.1
>
>         Attachments: 3431-v2.txt, 3431-v3.txt, 3431-v3.txt, 3431-v4.txt, 3431.txt
>
>
> Our man Ted Dunning found the following where RS checks in with one name, the master tells it use another name but we seem to go ahead and continue with our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 0 on 60000]: Registering server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> ....
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 2 on 60000]: Registering server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with the reportStartup.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991612#comment-12991612 ] 

ryan rawson commented on HBASE-3431:
------------------------------------

one thing to consider is a lot of the network code attempts to figure
out what is the 'primary ip' then bind to just that IP.

would it make sense to bind to * instead? (ie: 0.0.0.0) Why not accept
RPCs on all interfaces? If security is a concern, I think SASL and
host level firewall controls are a better way to address that, rather
than bake it in HBase.  That way it won't really "matter" what our IP
is, whatever IP the master 'sees' us as could be used as what to stuff
in the META.  Then we could use the registration name to identify dead
hosts, etc, etc.


> Regionserver is not using the name given it by the master; double entry in master listing of servers
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3431
>                 URL: https://issues.apache.org/jira/browse/HBASE-3431
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.90.1
>
>         Attachments: 3431-v2.txt, 3431-v3.txt, 3431-v3.txt, 3431-v4.txt, 3431.txt
>
>
> Our man Ted Dunning found the following where RS checks in with one name, the master tells it use another name but we seem to go ahead and continue with our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 0 on 60000]: Registering server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> ....
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 2 on 60000]: Registering server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with the reportStartup.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991081#comment-12991081 ] 

Jean-Daniel Cryans commented on HBASE-3431:
-------------------------------------------

bq. Instead Master just uses the ServerName the RS volunteered.

So what happens if region server passes 127.0.0.1?

> Regionserver is not using the name given it by the master; double entry in master listing of servers
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3431
>                 URL: https://issues.apache.org/jira/browse/HBASE-3431
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.90.1
>
>         Attachments: 3431-v2.txt, 3431-v3.txt, 3431-v3.txt, 3431.txt
>
>
> Our man Ted Dunning found the following where RS checks in with one name, the master tells it use another name but we seem to go ahead and continue with our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 0 on 60000]: Registering server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> ....
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 2 on 60000]: Registering server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with the reportStartup.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987334#action_12987334 ] 

stack commented on HBASE-3431:
------------------------------

Up on IRC we just had case where RS was reporting hostname only but reverse lookup was return FQDN.

> Regionserver is not using the name given it by the master; double entry in master listing of servers
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3431
>                 URL: https://issues.apache.org/jira/browse/HBASE-3431
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: stack
>            Priority: Blocker
>             Fix For: 0.90.1
>
>         Attachments: 3431.txt
>
>
> Our man Ted Dunning found the following where RS checks in with one name, the master tells it use another name but we seem to go ahead and continue with our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 0 on 60000]: Registering server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> ....
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 2 on 60000]: Registering server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with the reportStartup.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979087#action_12979087 ] 

stack commented on HBASE-3431:
------------------------------

Seems like this is a regression since 0.89.  Ted says 0.89 works on his cluster.  The master is seeing RS as an IP then subsequently the RS is giving the IP back as its 'name'.  Ted is also starting things a little odd... manually starting each daemon... with the RS saying that its NotReadyYet exception in 0.90.

> Regionserver is not using the name given it by the master; double entry in master listing of servers
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3431
>                 URL: https://issues.apache.org/jira/browse/HBASE-3431
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: stack
>
> Our man Ted Dunning found the following where RS checks in with one name, the master tells it use another name but we seem to go ahead and continue with our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 0 on 60000]: Registering server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> ....
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 2 on 60000]: Registering server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with the reportStartup.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979056#action_12979056 ] 

stack commented on HBASE-3431:
------------------------------

Why is RS not taking what the Master tells it use when gong to the Master?

> Regionserver is not using the name given it by the master; double entry in master listing of servers
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3431
>                 URL: https://issues.apache.org/jira/browse/HBASE-3431
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: stack
>
> Our man Ted Dunning found the following where RS checks in with one name, the master tells it use another name but we seem to go ahead and continue with our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 0 on 60000]: Registering server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> ....
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 2 on 60000]: Registering server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with the reportStartup.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3431:
-------------------------

    Status: Patch Available  (was: Open)

> Regionserver is not using the name given it by the master; double entry in master listing of servers
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3431
>                 URL: https://issues.apache.org/jira/browse/HBASE-3431
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.90.1
>
>         Attachments: 3431-v2.txt, 3431-v3.txt, 3431-v3.txt, 3431-v4.txt, 3431.txt
>
>
> Our man Ted Dunning found the following where RS checks in with one name, the master tells it use another name but we seem to go ahead and continue with our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 0 on 60000]: Registering server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> ....
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 2 on 60000]: Registering server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with the reportStartup.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991610#comment-12991610 ] 

stack commented on HBASE-3431:
------------------------------

Chatted w/ Jon and J-D on this.  Jon suggests EnvironmentEdgeManager utility as means of intercepting lookups so we can do up tests returning different answers.  Let me try it out.  J-D rehearsed issues w/ have had in here over time and that this 'mess' was 'working' in 0.20.x and even unto 0.89.x (He remembers also that a RS can volunteer its address as 127.0.0.1 but actually bind to real, non-localhost address somehow).  He's wary about stripping it all out as the patch does.  Let me try and put up unit tests that can mock the various scenarios.

Looking at code w/ J-D, we turned up one problematic bit of code -- HSA will create a new InetSocketAddress on deserialization which can result in a lookup.

Looking in hdfs, datanode generates a registration name -- e.g. DS-198919343-10.20.20.187-10010-1291133524722 -- and this is how it identifies itself to NN regardless.  No messing w/ NN telling it what name to use.   TT does something similar.

> Regionserver is not using the name given it by the master; double entry in master listing of servers
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3431
>                 URL: https://issues.apache.org/jira/browse/HBASE-3431
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.90.1
>
>         Attachments: 3431-v2.txt, 3431-v3.txt, 3431-v3.txt, 3431-v4.txt, 3431.txt
>
>
> Our man Ted Dunning found the following where RS checks in with one name, the master tells it use another name but we seem to go ahead and continue with our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 0 on 60000]: Registering server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> ....
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 2 on 60000]: Registering server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with the reportStartup.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991662#comment-12991662 ] 

stack commented on HBASE-3431:
------------------------------

bq. Looking in hdfs, datanode generates a registration name – e.g. DS-198919343-10.20.20.187-10010-1291133524722 – and this is how it identifies itself to NN regardless. No messing w/ NN telling it what name to use. 

J-D points out that I'm reading this code lazily (i.e. wrong), that on registration, the NN returns a DataRegistration instance that the DN will use going forward.

> Regionserver is not using the name given it by the master; double entry in master listing of servers
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3431
>                 URL: https://issues.apache.org/jira/browse/HBASE-3431
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.90.1
>
>         Attachments: 3431-v2.txt, 3431-v3.txt, 3431-v3.txt, 3431-v4.txt, 3431.txt
>
>
> Our man Ted Dunning found the following where RS checks in with one name, the master tells it use another name but we seem to go ahead and continue with our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 0 on 60000]: Registering server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> ....
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 2 on 60000]: Registering server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with the reportStartup.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] Updated: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3431:
-------------------------

    Attachment: 3431-v3.txt

Should have stripped setting IP of RS on Master side when doing startup message.  More breaking of DNS turned up this one.  Still testing.

> Regionserver is not using the name given it by the master; double entry in master listing of servers
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3431
>                 URL: https://issues.apache.org/jira/browse/HBASE-3431
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.90.1
>
>         Attachments: 3431-v2.txt, 3431-v3.txt, 3431.txt
>
>
> Our man Ted Dunning found the following where RS checks in with one name, the master tells it use another name but we seem to go ahead and continue with our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 0 on 60000]: Registering server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> ....
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 2 on 60000]: Registering server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with the reportStartup.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3431:
-------------------------

         Priority: Blocker  (was: Major)
    Fix Version/s: 0.90.1

We are seeing lots of permutations on this issue up in mailing lists.  Lets fix this in a 0.90.1.

> Regionserver is not using the name given it by the master; double entry in master listing of servers
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3431
>                 URL: https://issues.apache.org/jira/browse/HBASE-3431
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: stack
>            Priority: Blocker
>             Fix For: 0.90.1
>
>         Attachments: 3431.txt
>
>
> Our man Ted Dunning found the following where RS checks in with one name, the master tells it use another name but we seem to go ahead and continue with our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 0 on 60000]: Registering server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> ....
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 2 on 60000]: Registering server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with the reportStartup.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3431:
-------------------------

    Attachment: 3431-v4.txt

> Regionserver is not using the name given it by the master; double entry in master listing of servers
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3431
>                 URL: https://issues.apache.org/jira/browse/HBASE-3431
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.90.1
>
>         Attachments: 3431-v2.txt, 3431-v3.txt, 3431-v3.txt, 3431-v4.txt, 3431.txt
>
>
> Our man Ted Dunning found the following where RS checks in with one name, the master tells it use another name but we seem to go ahead and continue with our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 0 on 60000]: Registering server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> ....
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 2 on 60000]: Registering server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with the reportStartup.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Assigned: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack reassigned HBASE-3431:
----------------------------

    Assignee: stack

> Regionserver is not using the name given it by the master; double entry in master listing of servers
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3431
>                 URL: https://issues.apache.org/jira/browse/HBASE-3431
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.90.1
>
>         Attachments: 3431.txt
>
>
> Our man Ted Dunning found the following where RS checks in with one name, the master tells it use another name but we seem to go ahead and continue with our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 0 on 60000]: Registering server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> ....
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 2 on 60000]: Registering server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with the reportStartup.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991099#comment-12991099 ] 

stack commented on HBASE-3431:
------------------------------

Tested w/ name resolution broke on both ends.  If I broke lookup good, server wouldn't start complaining couldn't resolve name (thats not new to my patch).  If no resolve when it got to server side then again same thing w/ a complaint that couldn't resolve regionserver name... again not new to my patch... more a commentary on how hbase will complain loudly already if resolve is mangled.  Messages are pretty plain about whats wrong.

I broke master resolve so the incoming RS did not resolve to a proper address -- in the past we'd send back an IP and use that ever after and then you'd have double-vision after next heartbeat -- and then on RS I broke it so passed back a FQDN when Master was dealing in host names only.  That worked too.

Review please.  Unit tests are hard to do.  Would have to somehow mock java dns lookup.  Changing the dns doesn't seem to be possible (I can see providing alternate dns provider to jndi if you provide flags on JVM startup).


> Regionserver is not using the name given it by the master; double entry in master listing of servers
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3431
>                 URL: https://issues.apache.org/jira/browse/HBASE-3431
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.90.1
>
>         Attachments: 3431-v2.txt, 3431-v3.txt, 3431-v3.txt, 3431-v4.txt, 3431.txt
>
>
> Our man Ted Dunning found the following where RS checks in with one name, the master tells it use another name but we seem to go ahead and continue with our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 0 on 60000]: Registering server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> ....
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 2 on 60000]: Registering server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with the reportStartup.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3431:
-------------------------

    Attachment: 3431-v2.txt

The issue is that if the master sees a RegionServer differently to how the RS sees itself -- e.g. master gets an ip when it does lookup though RS passed a name or if RS passed a FQDN but master has hostname only -- then the master will ask the RS to take on the name the Master sees by passing it back an HServerAddress.  This does not work if the two servers are getting different answers from their respective DNS's.  The Master knows RS's by their 'ServerName' which is hostname+port+startcode.  If DNS is wonky, then the Master and RS will come up with different 'ServerName's even if the Master passes back its HSA (HSA could be IP only, RS does lookup and comes up w/ different hostname if DNS is broke).  This patch removes the code that has master trying the RS the identity to use.  Instead Master just uses the ServerName the RS volunteered.

So far in testing it seems to work when DNS is set up properly and when Master side DNS is broke where its finding IP only for RS.  Let me do some more testing.

> Regionserver is not using the name given it by the master; double entry in master listing of servers
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3431
>                 URL: https://issues.apache.org/jira/browse/HBASE-3431
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.90.1
>
>         Attachments: 3431-v2.txt, 3431.txt
>
>
> Our man Ted Dunning found the following where RS checks in with one name, the master tells it use another name but we seem to go ahead and continue with our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 0 on 60000]: Registering server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> ....
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 2 on 60000]: Registering server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with the reportStartup.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3431:
-------------------------

    Attachment: 3431.txt

A unit test that has master setting an address for the regionserver to use and then verifying that subsequently the regionserver volunteers what we told it use.  Passes on TRUNK which would seem to say this stuff should be working fine.

I compared 0.89 and 0.90 HRS.  Nothing jumps out.

> Regionserver is not using the name given it by the master; double entry in master listing of servers
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3431
>                 URL: https://issues.apache.org/jira/browse/HBASE-3431
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: stack
>         Attachments: 3431.txt
>
>
> Our man Ted Dunning found the following where RS checks in with one name, the master tells it use another name but we seem to go ahead and continue with our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 0 on 60000]: Registering server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> ....
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 2 on 60000]: Registering server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with the reportStartup.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3431:
-------------------------

    Attachment: 3431-v3.txt

More DNS breaking turned up fact that on startup, Master should not be setting RS address into HSI.  Still testing.

> Regionserver is not using the name given it by the master; double entry in master listing of servers
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3431
>                 URL: https://issues.apache.org/jira/browse/HBASE-3431
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.90.1
>
>         Attachments: 3431-v2.txt, 3431-v3.txt, 3431-v3.txt, 3431.txt
>
>
> Our man Ted Dunning found the following where RS checks in with one name, the master tells it use another name but we seem to go ahead and continue with our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 0 on 60000]: Registering server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> ....
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 2 on 60000]: Registering server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with the reportStartup.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991694#comment-12991694 ] 

stack commented on HBASE-3431:
------------------------------

I can't use EnvironmentEdge to change addresses since the InetSocketAddress that is at root of our HServerAddress, etc., is taken from the socket down in RPC -- I can't interject EnvironmentEdge inside Socket.getLocalSocketAddress, etc.

I can't change how HSA or HSI serialize since this is a point release.

All this is going to go away, or at least change radically, 0.92 because we intend dropping heartbeat.

> Regionserver is not using the name given it by the master; double entry in master listing of servers
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3431
>                 URL: https://issues.apache.org/jira/browse/HBASE-3431
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.90.1
>
>         Attachments: 3431-v2.txt, 3431-v3.txt, 3431-v3.txt, 3431-v4.txt, 3431.txt
>
>
> Our man Ted Dunning found the following where RS checks in with one name, the master tells it use another name but we seem to go ahead and continue with our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 0 on 60000]: Registering server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> ....
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 2 on 60000]: Registering server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with the reportStartup.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987320#action_12987320 ] 

stack commented on HBASE-3431:
------------------------------

Workaround is to make reverse DNS on master produce same hostname as that which the RegionServer reports (RS hostname lookup).

> Regionserver is not using the name given it by the master; double entry in master listing of servers
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3431
>                 URL: https://issues.apache.org/jira/browse/HBASE-3431
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: stack
>            Priority: Blocker
>             Fix For: 0.90.1
>
>         Attachments: 3431.txt
>
>
> Our man Ted Dunning found the following where RS checks in with one name, the master tells it use another name but we seem to go ahead and continue with our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 0 on 60000]: Registering server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> ....
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 2 on 60000]: Registering server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with the reportStartup.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991661#comment-12991661 ] 

stack commented on HBASE-3431:
------------------------------

bq. Looking in hdfs, datanode generates a registration name – e.g. DS-198919343-10.20.20.187-10010-1291133524722 – and this is how it identifies itself to NN regardless. No messing w/ NN telling it what name to use. 

J-D points out that I'm reading this code lazily (i.e. wrong), that on registration, the NN returns a DataRegistration instance that the DN will use going forward.

> Regionserver is not using the name given it by the master; double entry in master listing of servers
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3431
>                 URL: https://issues.apache.org/jira/browse/HBASE-3431
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.90.1
>
>         Attachments: 3431-v2.txt, 3431-v3.txt, 3431-v3.txt, 3431-v4.txt, 3431.txt
>
>
> Our man Ted Dunning found the following where RS checks in with one name, the master tells it use another name but we seem to go ahead and continue with our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 0 on 60000]: Registering server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> ....
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 2 on 60000]: Registering server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with the reportStartup.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira