You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by "Di Li (JIRA)" <ji...@apache.org> on 2015/06/24 22:04:05 UTC

[jira] [Assigned] (AMBARI-11854) ambari-agent fails to start when node has multiple network cards with some does not have IP address

     [ https://issues.apache.org/jira/browse/AMBARI-11854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Di Li reassigned AMBARI-11854:
------------------------------

    Assignee: Di Li

> ambari-agent fails to start when node has multiple network cards with some does not have IP address
> ---------------------------------------------------------------------------------------------------
>
>                 Key: AMBARI-11854
>                 URL: https://issues.apache.org/jira/browse/AMBARI-11854
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-agent
>    Affects Versions: 2.1.0.
>         Environment: AMD
>            Reporter: Tuong Truong
>            Assignee: Di Li
>             Fix For: 2.1.0.
>
>         Attachments: AMBARI-11854.patch
>
>
> In a cluster with nodes that has multiple network interfaces..   Ambari-agent fails to start due to one or more active network interface did not bind to an IP address.
> The /var/log/ambari-agent/ambari-agent.out shows
> Traceback (most recent call last):
>   File "/usr/lib64/python2.6/threading.py", line 532, in __bootstrap_inner
>     self.run()
>   File "/usr/lib/python2.6/site-packages/ambari_agent/Controller.py", line 354, in run
>     self.register = Register(self.config)
>   File "/usr/lib/python2.6/site-packages/ambari_agent/Register.py", line 34, in __init__
>     self.hardware = Hardware()
>   File "/usr/lib/python2.6/site-packages/ambari_agent/Hardware.py", line 41, in __init__
>     self.hardware.update(Facter().facterInfo())
>   File "/usr/lib/python2.6/site-packages/ambari_agent/Facter.py", line 466, in facterInfo
>     facterInfo = super(FacterLinux, self).facterInfo()
>   File "/usr/lib/python2.6/site-packages/ambari_agent/Facter.py", line 161, in facterInfo
>     facterInfo['netmask'] = self.getNetmask()
>   File "/usr/lib/python2.6/site-packages/ambari_agent/Facter.py", line 384, in getNetmask
>     if primary_ip == self.get_ip_address_by_ifname(i.strip()).strip():
>   File "/usr/lib/python2.6/site-packages/ambari_agent/Facter.py", line 397, in get_ip_address_by_ifname
>     struct.pack('256s', ifname[:15])
> IOError: [Errno 99] Cannot assign requested address
> Ran command manually on the nodes that failed to register  'python /usr/lib/python2.6/site-packages/ambari_agent/Facter.py'  and got the same response.
> When we ran it on nodes where the registration was successful I get a json response like
> {'kernel': 'Linux', 'domain': 'svl.ibm.com', 'kernelrelease': '2.6.32-504.el6.x86_64', 'uptime_days': '0', 'memorytotal': 49413988, 'swapfree': '8.00 GB', 'processorcount': 24, 'selinux': False, 'timezone': 'PST', 'hardwareisa': 'x86_64', 'operatingsystem': 'redhat', 'hostname': 'hdperf014', 'id': 'root', 'memoryfree': 48185456, 'hardwaremodel': 'x86_64', 'uptime_seconds': '11578', 'osfamily': 'redhat', 'memorysize': 49413988, 'interfaces': 'eth0,lo', 'physicalprocessorcount': 24, 'swapsize': '8.00 GB', 'netmask': '255.255.255.0', 'ipaddress': '9.30.75.23', 'kernelmajversion': '2.6', 'kernelversion': '2.6.32', 'macaddress': '00:02:C9:4B:57:62', 'operatingsystemrelease': '6.6', 'uptime_hours': '3', 'fqdn': 'hdperf014.svl.ibm.com', 'architecture': 'x86_64'}
> rroot@xxxxx ambari-agent]# ifconfig
> eth0      Link encap:Ethernet  HWaddr 5C:F3:FC:A6:48:B4
>           UP BROADCAST MULTICAST  MTU:1500  Metric:1
>           RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
>           Memory:93360000-9337ffff
> eth2      Link encap:Ethernet  HWaddr 00:02:C9:4B:57:CE
>           inet addr:9.30.75.21  Bcast:9.30.75.255  Mask:255.255.255.0
>           UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1
>           RX packets:48830 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:25329 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:2000
>           RX bytes:64833325 (61.8 MiB)  TX bytes:2582433 (2.4 MiB)
> lo        Link encap:Local Loopback
>           inet addr:127.0.0.1  Mask:255.0.0.0
>           UP LOOPBACK RUNNING  MTU:65536  Metric:1
>           RX packets:21 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:21 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
>           RX bytes:1560 (1.5 KiB)  TX bytes:1560 (1.5 KiB)
> workaround is to deactivate the network interface: :ifconfig eth0 down
> If config now sees
> [root@hdperf012 ambari_agent]# ifconfig
> eth2      Link encap:Ethernet  HWaddr 00:02:C9:4B:57:CE
>           inet addr:9.30.75.21  Bcast:9.30.75.255  Mask:255.255.255.0
>           UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1
>           RX packets:49006 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:25420 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:2000
>           RX bytes:64847473 (61.8 MiB)  TX bytes:2593953 (2.4 MiB)
> lo        Link encap:Local Loopback
>           inet addr:127.0.0.1  Mask:255.0.0.0
>           UP LOOPBACK RUNNING  MTU:65536  Metric:1
>           RX packets:21 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:21 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
>           RX bytes:1560 (1.5 KiB)  TX bytes:1560 (1.5 KiB)
> ambari-agent comes up afterward.
> Same machine did not hit the problem in prior Ambari build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)