You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Rural Hunter <ru...@gmail.com> on 2013/12/10 02:31:45 UTC

Why hadoop/hbase uses DNS/hosts/hostname in such a strange way?

Hi,

I have configured a hadoop/hbase cluster recently and found it's really 
a mess with all those DNS, hostname and /etc/hosts configuration. There 
are many questions related to this all over the internet. So I'm 
wondering why hadoop/hbase designed in such a strange way, which is very 
abnormal comparing with other network/distribution applications. In 
normal applications, DNS is used to indentify other servers(logical or 
physical), not the server itself. But I'm seeing this weired behavior in 
hadoop/hbase.

Say we have server1 and server2 configured this way:

server1(ip 192.168.1.2)
hostname: server1
/etc/hosts:
127.0.0.1    localhost,server1
192.168.1.3    server2

server2(ip 192.168.1.3)
hostname: server2
/etc/hosts:
127.0.0.1    localhost,server2
192.168.1.2    server1

With the configuration above, I'm seesing many cases hadoop/hbase trying 
to connect to localhost while it actaully should connect to another 
server. I believe this is because server1 reported its hostname as 
'localhost' to server2 and server2 tries to use 'localhost' to connect 
to server1. But it shouldn't work that way. In normal network 
applications, server2 shouldn't try to connect to server1 with what 
server1 reported. If server2 inits the connection, it should use the DNS 
or /etc/hosts to resolve server1. If server 1 inits the connection, 
server2 should use the ip it gets from the already established 
connection from server1. There shouldn't be any confusion or mess.

I don't see why hadoop/hbase can not use the same logic to handle the 
DNS/hosts/hostname mess. Anyone can resolve my confusion?

Re: Why hadoop/hbase uses DNS/hosts/hostname in such a strange way?

Posted by Geovanie Marquez <ge...@gmail.com>.
This may not answer why it is designed this way, but it should give you
more insight into how it is done.

Here is how the network
resolution<http://books.google.com/books?id=TQqSwRScVhoC&pg=PA58&lpg=PA58&dq=network+hostname++hadoop+operations&source=bl&ots=81GJwGnQ-j&sig=Im6hgfT1E9HsouI1Ez1yMyL3pZQ&hl=en&sa=X&ei=oSmnUtPJNoavkAfj8oCgCg&ved=0CDoQ6AEwAA#v=onepage&q=network%20hostname%20%20hadoop%20operations&f=false
> happens and the complication may be improved but this is what it is today


On Mon, Dec 9, 2013 at 8:31 PM, Rural Hunter <ru...@gmail.com> wrote:

> Hi,
>
> I have configured a hadoop/hbase cluster recently and found it's really a
> mess with all those DNS, hostname and /etc/hosts configuration. There are
> many questions related to this all over the internet. So I'm wondering why
> hadoop/hbase designed in such a strange way, which is very abnormal
> comparing with other network/distribution applications. In normal
> applications, DNS is used to indentify other servers(logical or physical),
> not the server itself. But I'm seeing this weired behavior in hadoop/hbase.
>
> Say we have server1 and server2 configured this way:
>
> server1(ip 192.168.1.2)
> hostname: server1
> /etc/hosts:
> 127.0.0.1    localhost,server1
> 192.168.1.3    server2
>
> server2(ip 192.168.1.3)
> hostname: server2
> /etc/hosts:
> 127.0.0.1    localhost,server2
> 192.168.1.2    server1
>
> With the configuration above, I'm seesing many cases hadoop/hbase trying
> to connect to localhost while it actaully should connect to another server.
> I believe this is because server1 reported its hostname as 'localhost' to
> server2 and server2 tries to use 'localhost' to connect to server1. But it
> shouldn't work that way. In normal network applications, server2 shouldn't
> try to connect to server1 with what server1 reported. If server2 inits the
> connection, it should use the DNS or /etc/hosts to resolve server1. If
> server 1 inits the connection, server2 should use the ip it gets from the
> already established connection from server1. There shouldn't be any
> confusion or mess.
>
> I don't see why hadoop/hbase can not use the same logic to handle the
> DNS/hosts/hostname mess. Anyone can resolve my confusion?
>