You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Oren <or...@infolinks.com> on 2012/01/04 16:08:36 UTC

reduce network problem after using cache dns

hi.
i have a small hadoop grid connected  with a 1g network.
when servers are configured to use the local dns server the jobs are 
running without a problem and copy speed during reduce is tens on MB.
once i change the servers to work with a cache only named server on each 
node, i start to get failed tasks with timeout errors.
also, copy speed is reduced to under 1M.

there is NO degradation in network, copy of files between servers is 
still tens of MB.
resolving is working ok and in the same speed (give or take) with both 
configurations.

any idea of what happens during the map/reduce process that causes this 
behavior?
this is an example for the exceptions i get during map:
Too many fetch-failures

and during reduce:
java.lang.RuntimeException: 
org.apache.hadoop.hbase.ZooKeeperConnectionException: 
java.net.UnknownHostException: s06.xxx.local at 
org.apache.hadoop.hbase.client.HTableFactory.createHTableInterface(HTableFactory.java:38) 
at 
org.apache.hadoop.hbase.client.HTablePool.createHTable(HTablePool.java:129) 
at 
org.apache.hadoop.hbase.client.HTablePool.getTable(HTablePool.java:89) 
at 
com.infolinks.hadoop.commons.hbase.HBaseOperations.getTable(HBaseOperations.java:118) 
at 
com.infolinks.hadoop.framework.HBaseReducer.setup(HBaseReducer.java:71) 
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174) at 
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566) 
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) at 
org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: 
org.apache.hadoop.hbase.ZooKeeperConnectionException: 
java.net.UnknownHostException: s06.xxx.local at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1000) 
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:303) 
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.(HConnectionManager.java:294) 
at 
org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:156) 
at org.apache.hadoop.hbase.client.HTable.(HTable.java:167) at 
org.apache.hadoop.hbase.client.HTableFactory.createHTableInterface(HTableFactory.java:36) 
... 8 more Caused by: java.net.UnknownHostException: s06.xxx.local at 
java.net.InetAddress.getAllByName0(InetAddress.java:1158) at 
java.net.InetAddress.getAllByName(InetAddress.java:1084) at 
java.net.InetAddress.getAllByName(InetAddress.java:1020) at 
org.apache.zookeeper.ClientCnxn.(ClientCnxn.java:386) at 
org.apache.zookeeper.ClientCnxn.(ClientCnxn.java:331) at 
org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:377) at 
org.apache.hadoop.hbase.zookeeper.ZKUtil.connect(ZKUtil.java:97) at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:119) 
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:998) 
... 13 more

thank you,
Oren.


Re: reduce network problem after using cache dns

Posted by Daryn Sharp <da...@yahoo-inc.com>.
I'm not sure if java is using the system's libc resolver, but assuming it is, you cannot use utilities like nslookup or dig because their use their own resolver.  Ping usually uses the libc resolver.  If you are on linux, you can use "getent hosts $hostname" to definitively test the libc resolver.

If you really do want to use mdns hosts (ie. end in ".local"), then you must have nss_mdns installed on your system and configure /etc/nsswitch.conf to use it.  You may also want to consider using nscd to cache dns lookup.  Although if you are using mdns, due to its dynamic nature, you may not want to cache (especially negative lookups) very long unless the host is assigned a static ip.

I hope this helps.

Daryn


On Jan 4, 2012, at 10:53 AM, Alexander Lorenz wrote:

> Hi,
> 
> Please ping the host you want to reach and check your hosts-file and your resolve.conf
> 
> - Alex
> 
> Alexander Lorenz
> http://mapredit.blogspot.com
> 
> On Jan 4, 2012, at 7:28 AM, Oren <or...@infolinks.com> wrote:
> 
>> so it seems but doing a dig from terminal command line returns the results correctly.
>> the same setting are running in production servers (not hadoop) for months without problems.
>> 
>> clarification - i changed servers names in logs, domain isn't xxx.local originally..
>> 
>> 
>> On 01/04/2012 05:19 PM, Harsh J wrote:
>>> Looks like your caching DNS servers aren't really functioning as you'd
>>> expect them to?
>>> 
>>>> org.apache.hadoop.hbase.ZooKeeperConnectionException:
>>>> java.net.UnknownHostException: s06.xxx.local
>>> (That .local also worries me, you probably have a misconfiguration in
>>> resolution somewhere.)
>>> 
>>> On Wed, Jan 4, 2012 at 8:38 PM, Oren<or...@infolinks.com>  wrote:
>>>> hi.
>>>> i have a small hadoop grid connected  with a 1g network.
>>>> when servers are configured to use the local dns server the jobs are running
>>>> without a problem and copy speed during reduce is tens on MB.
>>>> once i change the servers to work with a cache only named server on each
>>>> node, i start to get failed tasks with timeout errors.
>>>> also, copy speed is reduced to under 1M.
>>>> 
>>>> there is NO degradation in network, copy of files between servers is still
>>>> tens of MB.
>>>> resolving is working ok and in the same speed (give or take) with both
>>>> configurations.
>>>> 
>>>> any idea of what happens during the map/reduce process that causes this
>>>> behavior?
>>>> this is an example for the exceptions i get during map:
>>>> Too many fetch-failures
>>>> 
>>>> and during reduce:
>>>> java.lang.RuntimeException:
>>>> org.apache.hadoop.hbase.ZooKeeperConnectionException:
>>>> java.net.UnknownHostException: s06.xxx.local at
>>>> org.apache.hadoop.hbase.client.HTableFactory.createHTableInterface(HTableFactory.java:38)
>>>> at
>>>> org.apache.hadoop.hbase.client.HTablePool.createHTable(HTablePool.java:129)
>>>> at org.apache.hadoop.hbase.client.HTablePool.getTable(HTablePool.java:89) at
>>>> com.infolinks.hadoop.commons.hbase.HBaseOperations.getTable(HBaseOperations.java:118)
>>>> at com.infolinks.hadoop.framework.HBaseReducer.setup(HBaseReducer.java:71)
>>>> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174) at
>>>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566) at
>>>> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) at
>>>> org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by:
>>>> org.apache.hadoop.hbase.ZooKeeperConnectionException:
>>>> java.net.UnknownHostException: s06.xxx.local at
>>>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1000)
>>>> at
>>>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:303)
>>>> at
>>>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.(HConnectionManager.java:294)
>>>> at
>>>> org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:156)
>>>> at org.apache.hadoop.hbase.client.HTable.(HTable.java:167) at
>>>> org.apache.hadoop.hbase.client.HTableFactory.createHTableInterface(HTableFactory.java:36)
>>>> ... 8 more Caused by: java.net.UnknownHostException: s06.xxx.local at
>>>> java.net.InetAddress.getAllByName0(InetAddress.java:1158) at
>>>> java.net.InetAddress.getAllByName(InetAddress.java:1084) at
>>>> java.net.InetAddress.getAllByName(InetAddress.java:1020) at
>>>> org.apache.zookeeper.ClientCnxn.(ClientCnxn.java:386) at
>>>> org.apache.zookeeper.ClientCnxn.(ClientCnxn.java:331) at
>>>> org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:377) at
>>>> org.apache.hadoop.hbase.zookeeper.ZKUtil.connect(ZKUtil.java:97) at
>>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:119)
>>>> at
>>>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:998)
>>>> ... 13 more
>>>> 
>>>> thank you,
>>>> Oren.
>>>> 
>>> 
>>> 
>> 


Re: reduce network problem after using cache dns

Posted by Alexander Lorenz <wg...@googlemail.com>.
Hi,

Please ping the host you want to reach and check your hosts-file and your resolve.conf

- Alex

Alexander Lorenz
http://mapredit.blogspot.com

On Jan 4, 2012, at 7:28 AM, Oren <or...@infolinks.com> wrote:

> so it seems but doing a dig from terminal command line returns the results correctly.
> the same setting are running in production servers (not hadoop) for months without problems.
> 
> clarification - i changed servers names in logs, domain isn't xxx.local originally..
> 
> 
> On 01/04/2012 05:19 PM, Harsh J wrote:
>> Looks like your caching DNS servers aren't really functioning as you'd
>> expect them to?
>> 
>>> org.apache.hadoop.hbase.ZooKeeperConnectionException:
>>> java.net.UnknownHostException: s06.xxx.local
>> (That .local also worries me, you probably have a misconfiguration in
>> resolution somewhere.)
>> 
>> On Wed, Jan 4, 2012 at 8:38 PM, Oren<or...@infolinks.com>  wrote:
>>> hi.
>>> i have a small hadoop grid connected  with a 1g network.
>>> when servers are configured to use the local dns server the jobs are running
>>> without a problem and copy speed during reduce is tens on MB.
>>> once i change the servers to work with a cache only named server on each
>>> node, i start to get failed tasks with timeout errors.
>>> also, copy speed is reduced to under 1M.
>>> 
>>> there is NO degradation in network, copy of files between servers is still
>>> tens of MB.
>>> resolving is working ok and in the same speed (give or take) with both
>>> configurations.
>>> 
>>> any idea of what happens during the map/reduce process that causes this
>>> behavior?
>>> this is an example for the exceptions i get during map:
>>> Too many fetch-failures
>>> 
>>> and during reduce:
>>> java.lang.RuntimeException:
>>> org.apache.hadoop.hbase.ZooKeeperConnectionException:
>>> java.net.UnknownHostException: s06.xxx.local at
>>> org.apache.hadoop.hbase.client.HTableFactory.createHTableInterface(HTableFactory.java:38)
>>> at
>>> org.apache.hadoop.hbase.client.HTablePool.createHTable(HTablePool.java:129)
>>> at org.apache.hadoop.hbase.client.HTablePool.getTable(HTablePool.java:89) at
>>> com.infolinks.hadoop.commons.hbase.HBaseOperations.getTable(HBaseOperations.java:118)
>>> at com.infolinks.hadoop.framework.HBaseReducer.setup(HBaseReducer.java:71)
>>> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174) at
>>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566) at
>>> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) at
>>> org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by:
>>> org.apache.hadoop.hbase.ZooKeeperConnectionException:
>>> java.net.UnknownHostException: s06.xxx.local at
>>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1000)
>>> at
>>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:303)
>>> at
>>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.(HConnectionManager.java:294)
>>> at
>>> org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:156)
>>> at org.apache.hadoop.hbase.client.HTable.(HTable.java:167) at
>>> org.apache.hadoop.hbase.client.HTableFactory.createHTableInterface(HTableFactory.java:36)
>>> ... 8 more Caused by: java.net.UnknownHostException: s06.xxx.local at
>>> java.net.InetAddress.getAllByName0(InetAddress.java:1158) at
>>> java.net.InetAddress.getAllByName(InetAddress.java:1084) at
>>> java.net.InetAddress.getAllByName(InetAddress.java:1020) at
>>> org.apache.zookeeper.ClientCnxn.(ClientCnxn.java:386) at
>>> org.apache.zookeeper.ClientCnxn.(ClientCnxn.java:331) at
>>> org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:377) at
>>> org.apache.hadoop.hbase.zookeeper.ZKUtil.connect(ZKUtil.java:97) at
>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:119)
>>> at
>>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:998)
>>> ... 13 more
>>> 
>>> thank you,
>>> Oren.
>>> 
>> 
>> 
> 

Re: reduce network problem after using cache dns

Posted by Oren <or...@infolinks.com>.
so it seems but doing a dig from terminal command line returns the 
results correctly.
the same setting are running in production servers (not hadoop) for 
months without problems.

clarification - i changed servers names in logs, domain isn't xxx.local 
originally..


On 01/04/2012 05:19 PM, Harsh J wrote:
> Looks like your caching DNS servers aren't really functioning as you'd
> expect them to?
>
>> org.apache.hadoop.hbase.ZooKeeperConnectionException:
>> java.net.UnknownHostException: s06.xxx.local
> (That .local also worries me, you probably have a misconfiguration in
> resolution somewhere.)
>
> On Wed, Jan 4, 2012 at 8:38 PM, Oren<or...@infolinks.com>  wrote:
>> hi.
>> i have a small hadoop grid connected  with a 1g network.
>> when servers are configured to use the local dns server the jobs are running
>> without a problem and copy speed during reduce is tens on MB.
>> once i change the servers to work with a cache only named server on each
>> node, i start to get failed tasks with timeout errors.
>> also, copy speed is reduced to under 1M.
>>
>> there is NO degradation in network, copy of files between servers is still
>> tens of MB.
>> resolving is working ok and in the same speed (give or take) with both
>> configurations.
>>
>> any idea of what happens during the map/reduce process that causes this
>> behavior?
>> this is an example for the exceptions i get during map:
>> Too many fetch-failures
>>
>> and during reduce:
>> java.lang.RuntimeException:
>> org.apache.hadoop.hbase.ZooKeeperConnectionException:
>> java.net.UnknownHostException: s06.xxx.local at
>> org.apache.hadoop.hbase.client.HTableFactory.createHTableInterface(HTableFactory.java:38)
>> at
>> org.apache.hadoop.hbase.client.HTablePool.createHTable(HTablePool.java:129)
>> at org.apache.hadoop.hbase.client.HTablePool.getTable(HTablePool.java:89) at
>> com.infolinks.hadoop.commons.hbase.HBaseOperations.getTable(HBaseOperations.java:118)
>> at com.infolinks.hadoop.framework.HBaseReducer.setup(HBaseReducer.java:71)
>> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174) at
>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566) at
>> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) at
>> org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by:
>> org.apache.hadoop.hbase.ZooKeeperConnectionException:
>> java.net.UnknownHostException: s06.xxx.local at
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1000)
>> at
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:303)
>> at
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.(HConnectionManager.java:294)
>> at
>> org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:156)
>> at org.apache.hadoop.hbase.client.HTable.(HTable.java:167) at
>> org.apache.hadoop.hbase.client.HTableFactory.createHTableInterface(HTableFactory.java:36)
>> ... 8 more Caused by: java.net.UnknownHostException: s06.xxx.local at
>> java.net.InetAddress.getAllByName0(InetAddress.java:1158) at
>> java.net.InetAddress.getAllByName(InetAddress.java:1084) at
>> java.net.InetAddress.getAllByName(InetAddress.java:1020) at
>> org.apache.zookeeper.ClientCnxn.(ClientCnxn.java:386) at
>> org.apache.zookeeper.ClientCnxn.(ClientCnxn.java:331) at
>> org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:377) at
>> org.apache.hadoop.hbase.zookeeper.ZKUtil.connect(ZKUtil.java:97) at
>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:119)
>> at
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:998)
>> ... 13 more
>>
>> thank you,
>> Oren.
>>
>
>


Re: reduce network problem after using cache dns

Posted by Harsh J <ha...@cloudera.com>.
Looks like your caching DNS servers aren't really functioning as you'd
expect them to?

> org.apache.hadoop.hbase.ZooKeeperConnectionException:
> java.net.UnknownHostException: s06.xxx.local

(That .local also worries me, you probably have a misconfiguration in
resolution somewhere.)

On Wed, Jan 4, 2012 at 8:38 PM, Oren <or...@infolinks.com> wrote:
> hi.
> i have a small hadoop grid connected  with a 1g network.
> when servers are configured to use the local dns server the jobs are running
> without a problem and copy speed during reduce is tens on MB.
> once i change the servers to work with a cache only named server on each
> node, i start to get failed tasks with timeout errors.
> also, copy speed is reduced to under 1M.
>
> there is NO degradation in network, copy of files between servers is still
> tens of MB.
> resolving is working ok and in the same speed (give or take) with both
> configurations.
>
> any idea of what happens during the map/reduce process that causes this
> behavior?
> this is an example for the exceptions i get during map:
> Too many fetch-failures
>
> and during reduce:
> java.lang.RuntimeException:
> org.apache.hadoop.hbase.ZooKeeperConnectionException:
> java.net.UnknownHostException: s06.xxx.local at
> org.apache.hadoop.hbase.client.HTableFactory.createHTableInterface(HTableFactory.java:38)
> at
> org.apache.hadoop.hbase.client.HTablePool.createHTable(HTablePool.java:129)
> at org.apache.hadoop.hbase.client.HTablePool.getTable(HTablePool.java:89) at
> com.infolinks.hadoop.commons.hbase.HBaseOperations.getTable(HBaseOperations.java:118)
> at com.infolinks.hadoop.framework.HBaseReducer.setup(HBaseReducer.java:71)
> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174) at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566) at
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) at
> org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by:
> org.apache.hadoop.hbase.ZooKeeperConnectionException:
> java.net.UnknownHostException: s06.xxx.local at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1000)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:303)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.(HConnectionManager.java:294)
> at
> org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:156)
> at org.apache.hadoop.hbase.client.HTable.(HTable.java:167) at
> org.apache.hadoop.hbase.client.HTableFactory.createHTableInterface(HTableFactory.java:36)
> ... 8 more Caused by: java.net.UnknownHostException: s06.xxx.local at
> java.net.InetAddress.getAllByName0(InetAddress.java:1158) at
> java.net.InetAddress.getAllByName(InetAddress.java:1084) at
> java.net.InetAddress.getAllByName(InetAddress.java:1020) at
> org.apache.zookeeper.ClientCnxn.(ClientCnxn.java:386) at
> org.apache.zookeeper.ClientCnxn.(ClientCnxn.java:331) at
> org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:377) at
> org.apache.hadoop.hbase.zookeeper.ZKUtil.connect(ZKUtil.java:97) at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:119)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:998)
> ... 13 more
>
> thank you,
> Oren.
>



-- 
Harsh J