You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bigtop.apache.org by "Alexander van der Meij (JIRA)" <ji...@apache.org> on 2014/12/15 16:29:13 UTC

[jira] [Created] (BIGTOP-1573) rpm init scripts do not wait for network

Alexander van der Meij created BIGTOP-1573:
----------------------------------------------

             Summary: rpm init scripts do not wait for network
                 Key: BIGTOP-1573
                 URL: https://issues.apache.org/jira/browse/BIGTOP-1573
             Project: Bigtop
          Issue Type: Bug
          Components: rpm
    Affects Versions: 0.8.0
         Environment: CentOS 7
            Reporter: Alexander van der Meij


I have used Bigtop to generate a set of RPM's for the purpose of deploying multi-node Hadoop clusters. All the components work well, save for one network issue. 

It seems that the Hadoop daemons, when started at boot through their init scripts, do not wait for network initialisation to complete before they themselves are processed. As a result, when I reboot for example one of my datanodes, the hadoop-hdfs-datanode process is started using "localhost.localdomain" as its hostname - and it also advertises itself as such to the ResourceManager, leading to all sort of connectivity problems in a multi-node environment.

I first noticed this problem when, after a reboot, I saw log files being created of the form /var/log/hadoop-hdfs-datanode-localhost.localdomain.log. When I would restart the hdfs-datanode process using the same init scripts, the correct /var/log/hadoop-hdfs-datanode-{fqdn}.log are created. 

I believe the problem is caused by the introduction of systemd in CentOS 7; init scripts are run in parallel and there are no contraints present in the Hadoop init scripts that instruct it to wait until network initialisation is complete. 

Now for the good news, adding $network to the Required-Start/Stop list for all Hadoop daemons solves the issue for me:

/etc/init.d/hadoop-hdfs-datanode:
# Required-Start:    $syslog $remote_fs $network
# Required-Stop:     $syslog $remote_fs $network



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)