You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bigtop.apache.org by "Konstantin Boudnik (JIRA)" <ji...@apache.org> on 2014/12/15 16:31:13 UTC

[jira] [Commented] (BIGTOP-1573) rpm init scripts do not wait for network

    [ https://issues.apache.org/jira/browse/BIGTOP-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246741#comment-14246741 ] 

Konstantin Boudnik commented on BIGTOP-1573:
--------------------------------------------

Will this change work with current initd bootstrap sequence?

> rpm init scripts do not wait for network
> ----------------------------------------
>
>                 Key: BIGTOP-1573
>                 URL: https://issues.apache.org/jira/browse/BIGTOP-1573
>             Project: Bigtop
>          Issue Type: Bug
>          Components: rpm
>    Affects Versions: 0.8.0
>         Environment: CentOS 7
>            Reporter: Alexander van der Meij
>              Labels: build
>
> I have used Bigtop to generate a set of RPM's for the purpose of deploying multi-node Hadoop clusters. All the components work well, save for one network issue. 
> It seems that the Hadoop daemons, when started at boot through their init scripts, do not wait for network initialisation to complete before they themselves are processed. As a result, when I reboot for example one of my datanodes, the hadoop-hdfs-datanode process is started using "localhost.localdomain" as its hostname - and it also advertises itself as such to the ResourceManager, leading to all sort of connectivity problems in a multi-node environment.
> I first noticed this problem when, after a reboot, I saw log files being created of the form /var/log/hadoop-hdfs-datanode-localhost.localdomain.log. When I would restart the hdfs-datanode process using the same init scripts, the correct /var/log/hadoop-hdfs-datanode-{fqdn}.log are created. 
> I believe the problem is caused by the introduction of systemd in CentOS 7; init scripts are run in parallel and there are no contraints present in the Hadoop init scripts that instruct it to wait until network initialisation is complete. 
> Now for the good news, adding $network to the Required-Start/Stop list for all Hadoop daemons solves the issue for me:
> /etc/init.d/hadoop-hdfs-datanode:
> # Required-Start:    $syslog $remote_fs $network
> # Required-Stop:     $syslog $remote_fs $network



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)