You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bigtop.apache.org by "Leidle, Rob" <le...@amazon.com> on 2014/11/28 21:06:11 UTC

Re: Problem using puppet scripts to configure bigtop on AmazonLinux

++dev@bigtop.apache.org

From: <Leidle>, "Leidle, Rob" <le...@amazon.com>>
Reply-To: "user@bigtop.apache.org<ma...@bigtop.apache.org>" <us...@bigtop.apache.org>>
Date: Friday, November 28, 2014 at 11:14 AM
To: "user@bigtop.apache.org<ma...@bigtop.apache.org>" <us...@bigtop.apache.org>>
Subject: Problem using puppet scripts to configure bigtop on AmazonLinux

Hello all, I am trying to configure & install Bigtop 0.8.0 using the puppet scripts on AmazonLinux on EC2. Thus far, almost everything has worked besides one minor change I have made to the site.pp manifest. However, I am running into a problem, it seems that the services such as proxy server or namenode are not immediately starting. You can see the error below in the purple text related to namenode.


info: /Stage[main]/Hadoop::Common-yarn/File[/etc/hadoop/conf/yarn-site.xml]: Scheduling refresh of Service[hadoop-yarn-resourcemanager]

info: /Stage[main]/Hadoop::Common-yarn/File[/etc/hadoop/conf/yarn-site.xml]: Scheduling refresh of Service[hadoop-mapreduce-historyserver]

info: /Stage[main]/Hadoop::Common-yarn/File[/etc/hadoop/conf/yarn-site.xml]: Scheduling refresh of Service[hadoop-yarn-nodemanager]

info: /Stage[main]/Hadoop::Common-yarn/File[/etc/hadoop/conf/yarn-site.xml]: Scheduling refresh of Service[hadoop-yarn-proxyserver]

debug: /Stage[main]/Hadoop::Common-yarn/File[/etc/hadoop/conf/yarn-site.xml]: The container Class[Hadoop::Common-yarn] will propagate my refresh event

debug: Service[hadoop-yarn-proxyserver](provider=redhat): Executing '/sbin/service hadoop-yarn-proxyserver status'

debug: Service[hadoop-yarn-proxyserver](provider=redhat): Executing '/sbin/service hadoop-yarn-proxyserver start'

err: /Stage[main]/Hadoop_head_node/Hadoop::Proxyserver[proxyserver]/Service[hadoop-yarn-proxyserver]/ensure: change from stopped to running failed: Could not start Service[hadoop-yarn-proxyserver]: Execution of '/sbin/service hadoop-yarn-proxyserver start' returned 3:  at /mnt/var/lib/bootstrap-actions/1/bigtop-0.8.0/bigtop-deploy/puppet/modules/hadoop/manifests/init.pp:483

debug: Service[hadoop-yarn-proxyserver](provider=redhat): Executing '/sbin/service hadoop-yarn-proxyserver status'

debug: /Stage[main]/Hadoop_head_node/Hadoop::Proxyserver[proxyserver]/Service[hadoop-yarn-proxyserver]: Skipping restart; service is not running

notice: /Stage[main]/Hadoop_head_node/Hadoop::Proxyserver[proxyserver]/Service[hadoop-yarn-proxyserver]: Triggered 'refresh' from 4 events

debug: /Stage[main]/Hadoop_head_node/Hadoop::Proxyserver[proxyserver]/Service[hadoop-yarn-proxyserver]: The container Hadoop::Proxyserver[proxyserver] will propagate my refresh event

debug: Hadoop::Proxyserver[proxyserver]: The container Class[Hadoop_head_node] will propagate my refresh event

debug: Class[Hadoop::Common-yarn]: The container Stage[main] will propagate my refresh event

debug: Service[hadoop-hdfs-namenode](provider=redhat): Executing '/sbin/service hadoop-hdfs-namenode status'

debug: Service[hadoop-hdfs-namenode](provider=redhat): Executing '/sbin/service hadoop-hdfs-namenode start'

err: /Stage[main]/Hadoop_head_node/Hadoop::Namenode[namenode]/Service[hadoop-hdfs-namenode]/ensure: change from stopped to running failed: Could not start Service[hadoop-hdfs-namenode]: Execution of '/sbin/service hadoop-hdfs-namenode start' returned 3:  at /mnt/var/lib/bootstrap-actions/1/bigtop-0.8.0/bigtop-deploy/puppet/modules/hadoop/manifests/init.pp:335

debug: Service[hadoop-hdfs-namenode](provider=redhat): Executing '/sbin/service hadoop-hdfs-namenode status'

debug: /Stage[main]/Hadoop_head_node/Hadoop::Namenode[namenode]/Service[hadoop-hdfs-namenode]: Skipping restart; service is not running

notice: /Stage[main]/Hadoop_head_node/Hadoop::Namenode[namenode]/Service[hadoop-hdfs-namenode]: Triggered 'refresh' from 4 events

debug: /Stage[main]/Hadoop_head_node/Hadoop::Namenode[namenode]/Service[hadoop-hdfs-namenode]: The container Hadoop::Namenode[namenode] will propagate my refresh event

debug: Hadoop::Namenode[namenode]: The container Class[Hadoop_head_node] will propagate my refresh event

notice: /Stage[main]/Hadoop_worker_node/Hadoop::Datanode[datanode]/Package[hadoop-hdfs-datanode]: Dependency Service[hadoop-hdfs-namenode] has failures: true

warning: /Stage[main]/Hadoop_worker_node/Hadoop::Datanode[datanode]/Package[hadoop-hdfs-datanode]: Skipping because of failed dependencies

notice: /Stage[main]/Hadoop_worker_node/Hadoop::Datanode[datanode]/File[/mnt1/hdfs]: Dependency Service[hadoop-hdfs-namenode] has failures: true

warning: /Stage[main]/Hadoop_worker_node/Hadoop::Datanode[datanode]/File[/mnt1/hdfs]: Skipping because of failed dependencies

notice: /Stage[main]/Hadoop_worker_node/Hadoop::Datanode[datanode]/File[/mnt/hdfs]: Dependency Service[hadoop-hdfs-namenode] has failures: true


The interesting part is that if I query namenode status eventually the namenode will show up as started even though I have not taken any other actions:


[hadoop@ip-10-168-87-216 1]$ sudo /sbin/service hadoop-hdfs-namenode status

Hadoop namenode is not running                             [FAILED]

[hadoop@ip-10-168-87-216 1]$ sudo /sbin/service hadoop-hdfs-namenode status

Hadoop namenode is not running                             [FAILED]

[hadoop@ip-10-168-87-216 1]$ sudo /sbin/service hadoop-hdfs-namenode status

Hadoop namenode is running                                 [  OK  ]

The same goes for proxy server. The problem here is that all other dependencies of namenode do not install (such as resource manager, etc). I am using the latest release of AmazonLinux (2014.09) and this has puppet 2.7.25-1. I am not sure what to do about this issue, has anyone else experienced something like this? Should I just move to puppet 3.x and only try to install out of the Bigtop trunk (0.9.0)?

Re: Problem using puppet scripts to configure bigtop on AmazonLinux

Posted by ri...@reactor8.com.
Rob,Puppet itself does not provide for this type of capability, but
like most other config management solutions can be used to install and
configure packages that do. So if service has a configuration topology
 that handles some high availability mode, Puppet can configure this.
Similarly, as an example, if you wanted to use a process manager
solution like monit, you can write or leverage Puppet modules that
configure this to manage and monitor the daemons you wanted to better
protect. 
A general way to describe what most configuration management systems
do with respect to to high availability is that they are not involved
in a loop of  detecting errors and events and responding with
configuration changes although some systems are starting to tackle
things like "configuration triggers where configuration changes can be
triggered based on detected events". My view however in most cases
much better to do this using the underlying service's mechanism,
process management solutions or other infrastructure focused on high
availability if available.
-Rich

----- Original Message -----
From: "Leidle Rob" 
To:"dev@bigtop.apache.org" , "Konstantin Boudnik" ,
"user@bigtop.apache.org" 
Cc:"Rich" 
Sent:Thu, 11 Dec 2014 17:37:41 +0000
Subject:Re: Problem using puppet scripts to configure bigtop on
AmazonLinux

 Thanks Nate, this is exactly what I was looking for. One more
question — 
 does puppet have any mechanism for monitoring service daemons and 
 restarting them in the case where they have a catastrophic
failure/crash? 
 How do others in the Bigtop world deal with high availability and
ensuring 
 that processes are restarted when they inappropriately terminate?
Does 
 anyone have this kind of need?

 On 12/11/14, 12:26 AM, "Nate D'Amico"  wrote:

 >Guess breaking into two items:
 >
 >-detecting a failed puppet run when triggered via script/external
apply
 >-how many times to retry
 >
 >For the former, you could try to use " --detailed-exitcodes" which
should 
 >force a non-zero exit code, your script could detect that and act 
 >accordingly. Remember seeing a bug while back mentioned that you
needed 
 >to assert that param on apply to force puppet to return non-zero on 
 >error. Not sure if still exists, or what version you are running but

 >safe to probably try.
 >
 >As far as number of retries, all apps/services/etc could be
different.., 
 >only specific point of view I would say is given the puppet apply
has all 
 >data/attributes it needs to successfully converge, after two failed 
 >attempts you can safely assume failed, and then resort to log check
to 
 >see what issue could be.
 >
 >One other aspect to consider is that the puppet converge could
succeed 
 >but something outside causes a failure right after. Depending on 
 >resiliency you would want your process/other monitor to assert after
a 
 >successful run, and restart the whole converge run again.., or just 
 >notify, or etc.
 >
 >Does that help?
 >
 >
 >-----Original Message-----
 >From: Konstantin Boudnik [mailto:cos@apache.org] 
 >Sent: Wednesday, December 10, 2014 4:08 PM
 >To: user@bigtop.apache.org
 >Cc: dev@bigtop.apache.org; Nate D'Amico; Rich
 >Subject: Re: Problem using puppet scripts to configure bigtop on 
 >AmazonLinux
 >
 >Rob,
 >
 >following on our IRC chat I will Cc here two guys from the community
who 
 >know Puppet the best. Nate and Rich are likely to have the answer.
Guys, 
 >if you can chime in on the topic - it'd be great!
 >
 >To reiterate it: you are looking to a way to automatically tell if a

 >recipe has failed and repeat it, if required, right?
 >
 >On Sun, Nov 30, 2014 at 09:50PM, Leidle, Rob wrote:
 >> Thanks Cos,
 >> 
 >> This would be something that I would want to automate as it would
be 
 >> running many times across many different clusters. Ideally I would
fix 
 >> any issues causing the puppet scripts to not complete properly,
but I 
 >> don╧t know how realistic that is in the short term so I would
like to 
 >> setup retry logic if that is the recommended way of doing things. 
 >> That╧s why I was hoping for some direction on how often to run
the 
 >>retry.
 >> 
 >> On 11/29/14, 5:12 PM, "Konstantin Boudnik"  wrote:
 >> 
 >> >On Sun, Nov 30, 2014 at 12:50AM, Leidle, Rob wrote:
 >> >> Thanks Roman,
 >> >> 
 >> >> I actually fixed the problem. I had an existing process
monitoring 
 >> >>the daemon and restarting it if it terminated. However, puppet 
 >> >>encapsulates this so it is no longer needed. Also, this process
was 
 >> >>causing the namenode service to terminate once. I removed my 
 >> >>existing monitoring process and everything is working fine.
 >> >> 
 >> >> That being said is there a recommended number of times we
should 
 >> >>retry the puppet scripts on failure?
 >> >
 >> >Good to see you're coming through! As for the retries: if
something 
 >> >doesn't work I usually check the logs immediatelly. Sometimes
after a 
 >> >second re-run.
 >> >
 >> >Cos
 >> >
 >> 
 >


Re: Problem using puppet scripts to configure bigtop on AmazonLinux

Posted by "Leidle, Rob" <le...@amazon.com>.
Thanks Nate, this is exactly what I was looking for. One more question — 
does puppet have any mechanism for monitoring service daemons and 
restarting them in the case where they have a catastrophic failure/crash? 
How do others in the Bigtop world deal with high availability and ensuring 
that processes are restarted when they inappropriately terminate? Does 
anyone have this kind of need?




On 12/11/14, 12:26 AM, "Nate D'Amico" <na...@reactor8.com> wrote:

>Guess breaking into two items:
>
>-detecting a failed puppet run when triggered via script/external apply
>-how many times to retry
>
>For the former, you could try to use " --detailed-exitcodes" which should 
>force a non-zero exit code, your script could detect that and act 
>accordingly.  Remember seeing a bug while back mentioned that you needed 
>to assert that param on apply to force puppet to return non-zero on 
>error.  Not sure if still exists, or what version you are running but 
>safe to probably try.
>
>As far as number of retries, all apps/services/etc could be different.., 
>only specific point of view I would say is given the puppet apply has all 
>data/attributes it needs to successfully converge, after two failed 
>attempts you can safely assume failed, and then resort to log check to 
>see what issue could be.
>
>One other aspect to consider is that the puppet converge could succeed 
>but something outside causes a failure right after.  Depending on 
>resiliency you would want your process/other monitor to assert after a 
>successful run, and restart the whole converge run again.., or just 
>notify, or etc.
>
>Does that help?
>
>
>-----Original Message-----
>From: Konstantin Boudnik [mailto:cos@apache.org] 
>Sent: Wednesday, December 10, 2014 4:08 PM
>To: user@bigtop.apache.org
>Cc: dev@bigtop.apache.org; Nate D'Amico; Rich
>Subject: Re: Problem using puppet scripts to configure bigtop on 
>AmazonLinux
>
>Rob,
>
>following on our IRC chat I will Cc here two guys from the community who 
>know Puppet the best. Nate and Rich are likely to have the answer. Guys, 
>if you can chime in on the topic - it'd be great!
>
>To reiterate it: you are looking to a way to automatically tell if a 
>recipe has failed and repeat it, if required, right?
>
>On Sun, Nov 30, 2014 at 09:50PM, Leidle, Rob wrote:
>> Thanks Cos,
>> 
>> This would be something that I would want to automate as it would be 
>> running many times across many different clusters. Ideally I would fix 
>> any issues causing the puppet scripts to not complete properly, but I 
>> don╧t know how realistic that is in the short term so I would like to 
>> setup retry logic if that is the recommended way of doing things. 
>> That╧s why I was hoping for some direction on how often to run the 
>>retry.
>> 
>> On 11/29/14, 5:12 PM, "Konstantin Boudnik" <co...@apache.org> wrote:
>> 
>> >On Sun, Nov 30, 2014 at 12:50AM, Leidle, Rob wrote:
>> >> Thanks Roman,
>> >> 
>> >> I actually fixed the problem. I had an existing process monitoring 
>> >>the  daemon and restarting it if it terminated. However, puppet 
>> >>encapsulates this  so it is no longer needed. Also, this process was 
>> >>causing the namenode  service to terminate once. I removed my 
>> >>existing monitoring process and  everything is working fine.
>> >> 
>> >> That being said is there a recommended number of times we should 
>> >>retry the  puppet scripts on failure?
>> >
>> >Good to see you're coming through! As for the retries: if something 
>> >doesn't work I usually check the logs immediatelly. Sometimes after a 
>> >second re-run.
>> >
>> >Cos
>> >
>> 
>

RE: Problem using puppet scripts to configure bigtop on AmazonLinux

Posted by Nate D'Amico <na...@reactor8.com>.
Guess breaking into two items:

-detecting a failed puppet run when triggered via script/external apply
-how many times to retry

For the former, you could try to use " --detailed-exitcodes" which should force a non-zero exit code, your script could detect that and act accordingly.  Remember seeing a bug while back mentioned that you needed to assert that param on apply to force puppet to return non-zero on error.  Not sure if still exists, or what version you are running but safe to probably try.

As far as number of retries, all apps/services/etc could be different.., only specific point of view I would say is given the puppet apply has all data/attributes it needs to successfully converge, after two failed attempts you can safely assume failed, and then resort to log check to see what issue could be.

One other aspect to consider is that the puppet converge could succeed but something outside causes a failure right after.  Depending on resiliency you would want your process/other monitor to assert after a successful run, and restart the whole converge run again.., or just notify, or etc.

Does that help?


-----Original Message-----
From: Konstantin Boudnik [mailto:cos@apache.org] 
Sent: Wednesday, December 10, 2014 4:08 PM
To: user@bigtop.apache.org
Cc: dev@bigtop.apache.org; Nate D'Amico; Rich
Subject: Re: Problem using puppet scripts to configure bigtop on AmazonLinux

Rob,

following on our IRC chat I will Cc here two guys from the community who know Puppet the best. Nate and Rich are likely to have the answer. Guys, if you can chime in on the topic - it'd be great!

To reiterate it: you are looking to a way to automatically tell if a recipe has failed and repeat it, if required, right?

On Sun, Nov 30, 2014 at 09:50PM, Leidle, Rob wrote:
> Thanks Cos,
> 
> This would be something that I would want to automate as it would be 
> running many times across many different clusters. Ideally I would fix 
> any issues causing the puppet scripts to not complete properly, but I 
> don╧t know how realistic that is in the short term so I would like to 
> setup retry logic if that is the recommended way of doing things. 
> That╧s why I was hoping for some direction on how often to run the retry.
> 
> On 11/29/14, 5:12 PM, "Konstantin Boudnik" <co...@apache.org> wrote:
> 
> >On Sun, Nov 30, 2014 at 12:50AM, Leidle, Rob wrote:
> >> Thanks Roman,
> >> 
> >> I actually fixed the problem. I had an existing process monitoring 
> >>the  daemon and restarting it if it terminated. However, puppet 
> >>encapsulates this  so it is no longer needed. Also, this process was 
> >>causing the namenode  service to terminate once. I removed my 
> >>existing monitoring process and  everything is working fine.
> >> 
> >> That being said is there a recommended number of times we should 
> >>retry the  puppet scripts on failure?
> >
> >Good to see you're coming through! As for the retries: if something 
> >doesn't work I usually check the logs immediatelly. Sometimes after a 
> >second re-run.
> >
> >Cos
> >
> 


Re: Problem using puppet scripts to configure bigtop on AmazonLinux

Posted by "Leidle, Rob" <le...@amazon.com>.
Yes -- bare minimum I would like to know if the provisioning/recipe has failed to complete successfully. 



> On Dec 10, 2014, at 5:25 PM, Konstantin Boudnik <co...@apache.org> wrote:
> 
> Rob,
> 
> following on our IRC chat I will Cc here two guys from the community who know
> Puppet the best. Nate and Rich are likely to have the answer. Guys, if you can
> chime in on the topic - it'd be great!
> 
> To reiterate it: you are looking to a way to automatically tell if a recipe
> has failed and repeat it, if required, right?
> 
>> On Sun, Nov 30, 2014 at 09:50PM, Leidle, Rob wrote:
>> Thanks Cos,
>> 
>> This would be something that I would want to automate as it would be
>> running many times across many different clusters. Ideally I would fix any
>> issues causing the puppet scripts to not complete properly, but I don╧t
>> know how realistic that is in the short term so I would like to setup
>> retry logic if that is the recommended way of doing things. That╧s why I
>> was hoping for some direction on how often to run the retry.
>> 
>>> On 11/29/14, 5:12 PM, "Konstantin Boudnik" <co...@apache.org> wrote:
>>> 
>>>> On Sun, Nov 30, 2014 at 12:50AM, Leidle, Rob wrote:
>>>> Thanks Roman,
>>>> 
>>>> I actually fixed the problem. I had an existing process monitoring the
>>>> daemon and restarting it if it terminated. However, puppet encapsulates
>>>> this
>>>> so it is no longer needed. Also, this process was causing the namenode
>>>> service to terminate once. I removed my existing monitoring process and
>>>> everything is working fine.
>>>> 
>>>> That being said is there a recommended number of times we should retry
>>>> the
>>>> puppet scripts on failure?
>>> 
>>> Good to see you're coming through! As for the retries: if something
>>> doesn't
>>> work I usually check the logs immediatelly. Sometimes after a second
>>> re-run.
>>> 
>>> Cos
>> 

Re: Problem using puppet scripts to configure bigtop on AmazonLinux

Posted by Konstantin Boudnik <co...@apache.org>.
Rob,

following on our IRC chat I will Cc here two guys from the community who know
Puppet the best. Nate and Rich are likely to have the answer. Guys, if you can
chime in on the topic - it'd be great!

To reiterate it: you are looking to a way to automatically tell if a recipe
has failed and repeat it, if required, right?

On Sun, Nov 30, 2014 at 09:50PM, Leidle, Rob wrote:
> Thanks Cos,
> 
> This would be something that I would want to automate as it would be
> running many times across many different clusters. Ideally I would fix any
> issues causing the puppet scripts to not complete properly, but I don╧t
> know how realistic that is in the short term so I would like to setup
> retry logic if that is the recommended way of doing things. That╧s why I
> was hoping for some direction on how often to run the retry.
> 
> On 11/29/14, 5:12 PM, "Konstantin Boudnik" <co...@apache.org> wrote:
> 
> >On Sun, Nov 30, 2014 at 12:50AM, Leidle, Rob wrote:
> >> Thanks Roman,
> >> 
> >> I actually fixed the problem. I had an existing process monitoring the
> >> daemon and restarting it if it terminated. However, puppet encapsulates
> >>this
> >> so it is no longer needed. Also, this process was causing the namenode
> >> service to terminate once. I removed my existing monitoring process and
> >> everything is working fine.
> >> 
> >> That being said is there a recommended number of times we should retry
> >>the
> >> puppet scripts on failure?
> >
> >Good to see you're coming through! As for the retries: if something
> >doesn't
> >work I usually check the logs immediatelly. Sometimes after a second
> >re-run.
> >
> >Cos
> >
> 

Re: Problem using puppet scripts to configure bigtop on AmazonLinux

Posted by Konstantin Boudnik <co...@apache.org>.
Rob,

following on our IRC chat I will Cc here two guys from the community who know
Puppet the best. Nate and Rich are likely to have the answer. Guys, if you can
chime in on the topic - it'd be great!

To reiterate it: you are looking to a way to automatically tell if a recipe
has failed and repeat it, if required, right?

On Sun, Nov 30, 2014 at 09:50PM, Leidle, Rob wrote:
> Thanks Cos,
> 
> This would be something that I would want to automate as it would be
> running many times across many different clusters. Ideally I would fix any
> issues causing the puppet scripts to not complete properly, but I don╧t
> know how realistic that is in the short term so I would like to setup
> retry logic if that is the recommended way of doing things. That╧s why I
> was hoping for some direction on how often to run the retry.
> 
> On 11/29/14, 5:12 PM, "Konstantin Boudnik" <co...@apache.org> wrote:
> 
> >On Sun, Nov 30, 2014 at 12:50AM, Leidle, Rob wrote:
> >> Thanks Roman,
> >> 
> >> I actually fixed the problem. I had an existing process monitoring the
> >> daemon and restarting it if it terminated. However, puppet encapsulates
> >>this
> >> so it is no longer needed. Also, this process was causing the namenode
> >> service to terminate once. I removed my existing monitoring process and
> >> everything is working fine.
> >> 
> >> That being said is there a recommended number of times we should retry
> >>the
> >> puppet scripts on failure?
> >
> >Good to see you're coming through! As for the retries: if something
> >doesn't
> >work I usually check the logs immediatelly. Sometimes after a second
> >re-run.
> >
> >Cos
> >
> 

Re: Problem using puppet scripts to configure bigtop on AmazonLinux

Posted by Roman Shaposhnik <ro...@shaposhnik.org>.
On Sun, Nov 30, 2014 at 1:50 PM, Leidle, Rob <le...@amazon.com> wrote:
> Thanks Cos,
>
> This would be something that I would want to automate as it would be
> running many times across many different clusters. Ideally I would fix any
> issues causing the puppet scripts to not complete properly, but I don¹t
> know how realistic that is in the short term so I would like to setup
> retry logic if that is the recommended way of doing things. That¹s why I
> was hoping for some direction on how often to run the retry.

This strikes me as the most realistic approach. If, on top of that, you
could perhaps keep the stats on how many times you had to retry
before it converges that would be helpful for the rest of Bigtoppers.

Thanks,
Roman.

Re: Problem using puppet scripts to configure bigtop on AmazonLinux

Posted by "Leidle, Rob" <le...@amazon.com>.
Thanks Cos,

This would be something that I would want to automate as it would be
running many times across many different clusters. Ideally I would fix any
issues causing the puppet scripts to not complete properly, but I don¹t
know how realistic that is in the short term so I would like to setup
retry logic if that is the recommended way of doing things. That¹s why I
was hoping for some direction on how often to run the retry.

On 11/29/14, 5:12 PM, "Konstantin Boudnik" <co...@apache.org> wrote:

>On Sun, Nov 30, 2014 at 12:50AM, Leidle, Rob wrote:
>> Thanks Roman,
>> 
>> I actually fixed the problem. I had an existing process monitoring the
>> daemon and restarting it if it terminated. However, puppet encapsulates
>>this
>> so it is no longer needed. Also, this process was causing the namenode
>> service to terminate once. I removed my existing monitoring process and
>> everything is working fine.
>> 
>> That being said is there a recommended number of times we should retry
>>the
>> puppet scripts on failure?
>
>Good to see you're coming through! As for the retries: if something
>doesn't
>work I usually check the logs immediatelly. Sometimes after a second
>re-run.
>
>Cos
>


Re: Problem using puppet scripts to configure bigtop on AmazonLinux

Posted by "Leidle, Rob" <le...@amazon.com>.
Thanks Cos,

This would be something that I would want to automate as it would be
running many times across many different clusters. Ideally I would fix any
issues causing the puppet scripts to not complete properly, but I don¹t
know how realistic that is in the short term so I would like to setup
retry logic if that is the recommended way of doing things. That¹s why I
was hoping for some direction on how often to run the retry.

On 11/29/14, 5:12 PM, "Konstantin Boudnik" <co...@apache.org> wrote:

>On Sun, Nov 30, 2014 at 12:50AM, Leidle, Rob wrote:
>> Thanks Roman,
>> 
>> I actually fixed the problem. I had an existing process monitoring the
>> daemon and restarting it if it terminated. However, puppet encapsulates
>>this
>> so it is no longer needed. Also, this process was causing the namenode
>> service to terminate once. I removed my existing monitoring process and
>> everything is working fine.
>> 
>> That being said is there a recommended number of times we should retry
>>the
>> puppet scripts on failure?
>
>Good to see you're coming through! As for the retries: if something
>doesn't
>work I usually check the logs immediatelly. Sometimes after a second
>re-run.
>
>Cos
>


Re: Problem using puppet scripts to configure bigtop on AmazonLinux

Posted by Konstantin Boudnik <co...@apache.org>.
On Sun, Nov 30, 2014 at 12:50AM, Leidle, Rob wrote:
> Thanks Roman,
> 
> I actually fixed the problem. I had an existing process monitoring the
> daemon and restarting it if it terminated. However, puppet encapsulates this
> so it is no longer needed. Also, this process was causing the namenode
> service to terminate once. I removed my existing monitoring process and
> everything is working fine. 
> 
> That being said is there a recommended number of times we should retry the
> puppet scripts on failure?

Good to see you're coming through! As for the retries: if something doesn't
work I usually check the logs immediatelly. Sometimes after a second re-run.

Cos

> > On Nov 29, 2014, at 3:49 PM, Roman Shaposhnik <ro...@shaposhnik.org> wrote:
> > 
> >> On Fri, Nov 28, 2014 at 7:08 PM, Konstantin Boudnik <co...@apache.org> wrote:
> >>> On Sat, Nov 29, 2014 at 01:43AM, Leidle, Rob wrote:
> >>> Yes, I ran into Bigtop-1522 and figured out I needed to add mapred-app.
> >>> Sorry, I wrote what I said in the previous email incorrectly, yes,
> >>> resource manager does not install because the depdendency namenode does
> >>> not install correctly. I will look more closely at the service logs to see
> >>> if I can figure out why it isn╧t starting. The error code of Ё3╡ indicates
> >>> from the /etc/init.d/hadoop-hdfs-namenode script that this means it can╧t
> >>> find the running process 5 seconds after starting it.
> >> 
> >> Yes, please look into the logs - might be something obvious missed. We are
> >> running these recipes for a good 3+ years and they are fairly well tested.
> >> Would be good to fix last bugs if any ;)
> > 
> > What Cos said above, but also note that Puppet encourages this unfortunate
> > 'eventual convergence' pattern. IOW, even if the first time around a
> > few services
> > failed if everything goes OK on the next Puppet run -- the cluster comes up.
> > 
> > It would be very nice to debug the nitty gritty details of
> > synchronization issues
> > like the ones you seem to be seeing. Unfortunately, we haven't really had
> > much of a focus there, since, like I said, for internal Bigtop testing purposes
> > the 'eventual convergence' suffices.
> > 
> > Thanks,
> > Roman.

Re: Problem using puppet scripts to configure bigtop on AmazonLinux

Posted by Konstantin Boudnik <co...@apache.org>.
On Sun, Nov 30, 2014 at 12:50AM, Leidle, Rob wrote:
> Thanks Roman,
> 
> I actually fixed the problem. I had an existing process monitoring the
> daemon and restarting it if it terminated. However, puppet encapsulates this
> so it is no longer needed. Also, this process was causing the namenode
> service to terminate once. I removed my existing monitoring process and
> everything is working fine. 
> 
> That being said is there a recommended number of times we should retry the
> puppet scripts on failure?

Good to see you're coming through! As for the retries: if something doesn't
work I usually check the logs immediatelly. Sometimes after a second re-run.

Cos

> > On Nov 29, 2014, at 3:49 PM, Roman Shaposhnik <ro...@shaposhnik.org> wrote:
> > 
> >> On Fri, Nov 28, 2014 at 7:08 PM, Konstantin Boudnik <co...@apache.org> wrote:
> >>> On Sat, Nov 29, 2014 at 01:43AM, Leidle, Rob wrote:
> >>> Yes, I ran into Bigtop-1522 and figured out I needed to add mapred-app.
> >>> Sorry, I wrote what I said in the previous email incorrectly, yes,
> >>> resource manager does not install because the depdendency namenode does
> >>> not install correctly. I will look more closely at the service logs to see
> >>> if I can figure out why it isn╧t starting. The error code of Ё3╡ indicates
> >>> from the /etc/init.d/hadoop-hdfs-namenode script that this means it can╧t
> >>> find the running process 5 seconds after starting it.
> >> 
> >> Yes, please look into the logs - might be something obvious missed. We are
> >> running these recipes for a good 3+ years and they are fairly well tested.
> >> Would be good to fix last bugs if any ;)
> > 
> > What Cos said above, but also note that Puppet encourages this unfortunate
> > 'eventual convergence' pattern. IOW, even if the first time around a
> > few services
> > failed if everything goes OK on the next Puppet run -- the cluster comes up.
> > 
> > It would be very nice to debug the nitty gritty details of
> > synchronization issues
> > like the ones you seem to be seeing. Unfortunately, we haven't really had
> > much of a focus there, since, like I said, for internal Bigtop testing purposes
> > the 'eventual convergence' suffices.
> > 
> > Thanks,
> > Roman.

Re: Problem using puppet scripts to configure bigtop on AmazonLinux

Posted by "Leidle, Rob" <le...@amazon.com>.
Thanks Roman,

I actually fixed the problem. I had an existing process monitoring the daemon and restarting it if it terminated. However, puppet encapsulates this so it is no longer needed. Also, this process was causing the namenode service to terminate once. I removed my existing monitoring process and everything is working fine. 

That being said is there a recommended number of times we should retry the puppet scripts on failure?



> On Nov 29, 2014, at 3:49 PM, Roman Shaposhnik <ro...@shaposhnik.org> wrote:
> 
>> On Fri, Nov 28, 2014 at 7:08 PM, Konstantin Boudnik <co...@apache.org> wrote:
>>> On Sat, Nov 29, 2014 at 01:43AM, Leidle, Rob wrote:
>>> Yes, I ran into Bigtop-1522 and figured out I needed to add mapred-app.
>>> Sorry, I wrote what I said in the previous email incorrectly, yes,
>>> resource manager does not install because the depdendency namenode does
>>> not install correctly. I will look more closely at the service logs to see
>>> if I can figure out why it isn╧t starting. The error code of Ё3╡ indicates
>>> from the /etc/init.d/hadoop-hdfs-namenode script that this means it can╧t
>>> find the running process 5 seconds after starting it.
>> 
>> Yes, please look into the logs - might be something obvious missed. We are
>> running these recipes for a good 3+ years and they are fairly well tested.
>> Would be good to fix last bugs if any ;)
> 
> What Cos said above, but also note that Puppet encourages this unfortunate
> 'eventual convergence' pattern. IOW, even if the first time around a
> few services
> failed if everything goes OK on the next Puppet run -- the cluster comes up.
> 
> It would be very nice to debug the nitty gritty details of
> synchronization issues
> like the ones you seem to be seeing. Unfortunately, we haven't really had
> much of a focus there, since, like I said, for internal Bigtop testing purposes
> the 'eventual convergence' suffices.
> 
> Thanks,
> Roman.

Re: Problem using puppet scripts to configure bigtop on AmazonLinux

Posted by "Leidle, Rob" <le...@amazon.com>.
Thanks Roman,

I actually fixed the problem. I had an existing process monitoring the daemon and restarting it if it terminated. However, puppet encapsulates this so it is no longer needed. Also, this process was causing the namenode service to terminate once. I removed my existing monitoring process and everything is working fine. 

That being said is there a recommended number of times we should retry the puppet scripts on failure?



> On Nov 29, 2014, at 3:49 PM, Roman Shaposhnik <ro...@shaposhnik.org> wrote:
> 
>> On Fri, Nov 28, 2014 at 7:08 PM, Konstantin Boudnik <co...@apache.org> wrote:
>>> On Sat, Nov 29, 2014 at 01:43AM, Leidle, Rob wrote:
>>> Yes, I ran into Bigtop-1522 and figured out I needed to add mapred-app.
>>> Sorry, I wrote what I said in the previous email incorrectly, yes,
>>> resource manager does not install because the depdendency namenode does
>>> not install correctly. I will look more closely at the service logs to see
>>> if I can figure out why it isn╧t starting. The error code of Ё3╡ indicates
>>> from the /etc/init.d/hadoop-hdfs-namenode script that this means it can╧t
>>> find the running process 5 seconds after starting it.
>> 
>> Yes, please look into the logs - might be something obvious missed. We are
>> running these recipes for a good 3+ years and they are fairly well tested.
>> Would be good to fix last bugs if any ;)
> 
> What Cos said above, but also note that Puppet encourages this unfortunate
> 'eventual convergence' pattern. IOW, even if the first time around a
> few services
> failed if everything goes OK on the next Puppet run -- the cluster comes up.
> 
> It would be very nice to debug the nitty gritty details of
> synchronization issues
> like the ones you seem to be seeing. Unfortunately, we haven't really had
> much of a focus there, since, like I said, for internal Bigtop testing purposes
> the 'eventual convergence' suffices.
> 
> Thanks,
> Roman.

Re: Problem using puppet scripts to configure bigtop on AmazonLinux

Posted by Roman Shaposhnik <ro...@shaposhnik.org>.
On Fri, Nov 28, 2014 at 7:08 PM, Konstantin Boudnik <co...@apache.org> wrote:
> On Sat, Nov 29, 2014 at 01:43AM, Leidle, Rob wrote:
>> Yes, I ran into Bigtop-1522 and figured out I needed to add mapred-app.
>> Sorry, I wrote what I said in the previous email incorrectly, yes,
>> resource manager does not install because the depdendency namenode does
>> not install correctly. I will look more closely at the service logs to see
>> if I can figure out why it isn╧t starting. The error code of Ё3╡ indicates
>> from the /etc/init.d/hadoop-hdfs-namenode script that this means it can╧t
>> find the running process 5 seconds after starting it.
>
> Yes, please look into the logs - might be something obvious missed. We are
> running these recipes for a good 3+ years and they are fairly well tested.
> Would be good to fix last bugs if any ;)

What Cos said above, but also note that Puppet encourages this unfortunate
'eventual convergence' pattern. IOW, even if the first time around a
few services
failed if everything goes OK on the next Puppet run -- the cluster comes up.

It would be very nice to debug the nitty gritty details of
synchronization issues
like the ones you seem to be seeing. Unfortunately, we haven't really had
much of a focus there, since, like I said, for internal Bigtop testing purposes
the 'eventual convergence' suffices.

Thanks,
Roman.

Re: Problem using puppet scripts to configure bigtop on AmazonLinux

Posted by Roman Shaposhnik <ro...@shaposhnik.org>.
On Fri, Nov 28, 2014 at 7:08 PM, Konstantin Boudnik <co...@apache.org> wrote:
> On Sat, Nov 29, 2014 at 01:43AM, Leidle, Rob wrote:
>> Yes, I ran into Bigtop-1522 and figured out I needed to add mapred-app.
>> Sorry, I wrote what I said in the previous email incorrectly, yes,
>> resource manager does not install because the depdendency namenode does
>> not install correctly. I will look more closely at the service logs to see
>> if I can figure out why it isn╧t starting. The error code of Ё3╡ indicates
>> from the /etc/init.d/hadoop-hdfs-namenode script that this means it can╧t
>> find the running process 5 seconds after starting it.
>
> Yes, please look into the logs - might be something obvious missed. We are
> running these recipes for a good 3+ years and they are fairly well tested.
> Would be good to fix last bugs if any ;)

What Cos said above, but also note that Puppet encourages this unfortunate
'eventual convergence' pattern. IOW, even if the first time around a
few services
failed if everything goes OK on the next Puppet run -- the cluster comes up.

It would be very nice to debug the nitty gritty details of
synchronization issues
like the ones you seem to be seeing. Unfortunately, we haven't really had
much of a focus there, since, like I said, for internal Bigtop testing purposes
the 'eventual convergence' suffices.

Thanks,
Roman.

Re: Problem using puppet scripts to configure bigtop on AmazonLinux

Posted by Konstantin Boudnik <co...@apache.org>.
On Sat, Nov 29, 2014 at 01:43AM, Leidle, Rob wrote:
> Yes, I ran into Bigtop-1522 and figured out I needed to add mapred-app.
> Sorry, I wrote what I said in the previous email incorrectly, yes,
> resource manager does not install because the depdendency namenode does
> not install correctly. I will look more closely at the service logs to see
> if I can figure out why it isn╧t starting. The error code of Ё3╡ indicates
> from the /etc/init.d/hadoop-hdfs-namenode script that this means it can╧t
> find the running process 5 seconds after starting it.

Yes, please look into the logs - might be something obvious missed. We are
running these recipes for a good 3+ years and they are fairly well tested.
Would be good to fix last bugs if any ;)

Cos

> 
> On 11/28/14, 4:14 PM, "Konstantin Boudnik" <co...@apache.org> wrote:
> 
> >On Fri, Nov 28, 2014 at 08:06PM, Leidle, Rob wrote:
> >> Hello all, I am trying to configure & install Bigtop 0.8.0 using the
> >>puppet scripts on AmazonLinux on EC2. Thus far, almost everything has
> >>worked besides one minor change I have made to the site.pp manifest.
> >>However, I am running into a problem, it seems that the services such as
> >>proxy server or namenode are not immediately starting. You can see the
> >>error below in the purple text related to namenode.
> >> 
> >> 
> >> info: 
> >>/Stage[main]/Hadoop::Common-yarn/File[/etc/hadoop/conf/yarn-site.xml]:
> >>Scheduling refresh of Service[hadoop-yarn-resourcemanager]
> >> 
> >> info: 
> >>/Stage[main]/Hadoop::Common-yarn/File[/etc/hadoop/conf/yarn-site.xml]:
> >>Scheduling refresh of Service[hadoop-mapreduce-historyserver]
> >> 
> >> info: 
> >>/Stage[main]/Hadoop::Common-yarn/File[/etc/hadoop/conf/yarn-site.xml]:
> >>Scheduling refresh of Service[hadoop-yarn-nodemanager]
> >> 
> >> info: 
> >>/Stage[main]/Hadoop::Common-yarn/File[/etc/hadoop/conf/yarn-site.xml]:
> >>Scheduling refresh of Service[hadoop-yarn-proxyserver]
> >> 
> >> debug: 
> >>/Stage[main]/Hadoop::Common-yarn/File[/etc/hadoop/conf/yarn-site.xml]:
> >>The container Class[Hadoop::Common-yarn] will propagate my refresh event
> >> 
> >> debug: Service[hadoop-yarn-proxyserver](provider=redhat): Executing
> >>'/sbin/service hadoop-yarn-proxyserver status'
> >> 
> >> debug: Service[hadoop-yarn-proxyserver](provider=redhat): Executing
> >>'/sbin/service hadoop-yarn-proxyserver start'
> >> 
> >> err: 
> >>/Stage[main]/Hadoop_head_node/Hadoop::Proxyserver[proxyserver]/Service[ha
> >>doop-yarn-proxyserver]/ensure: change from stopped to running failed:
> >>Could not start Service[hadoop-yarn-proxyserver]: Execution of
> >>'/sbin/service hadoop-yarn-proxyserver start' returned 3:  at
> >>/mnt/var/lib/bootstrap-actions/1/bigtop-0.8.0/bigtop-deploy/puppet/module
> >>s/hadoop/manifests/init.pp:483
> >> 
> >> debug: Service[hadoop-yarn-proxyserver](provider=redhat): Executing
> >>'/sbin/service hadoop-yarn-proxyserver status'
> >> 
> >> debug: 
> >>/Stage[main]/Hadoop_head_node/Hadoop::Proxyserver[proxyserver]/Service[ha
> >>doop-yarn-proxyserver]: Skipping restart; service is not running
> >> 
> >> notice: 
> >>/Stage[main]/Hadoop_head_node/Hadoop::Proxyserver[proxyserver]/Service[ha
> >>doop-yarn-proxyserver]: Triggered 'refresh' from 4 events
> >> 
> >> debug: 
> >>/Stage[main]/Hadoop_head_node/Hadoop::Proxyserver[proxyserver]/Service[ha
> >>doop-yarn-proxyserver]: The container Hadoop::Proxyserver[proxyserver]
> >>will propagate my refresh event
> >> 
> >> debug: Hadoop::Proxyserver[proxyserver]: The container
> >>Class[Hadoop_head_node] will propagate my refresh event
> >> 
> >> debug: Class[Hadoop::Common-yarn]: The container Stage[main] will
> >>propagate my refresh event
> >> 
> >> debug: Service[hadoop-hdfs-namenode](provider=redhat): Executing
> >>'/sbin/service hadoop-hdfs-namenode status'
> >> 
> >> debug: Service[hadoop-hdfs-namenode](provider=redhat): Executing
> >>'/sbin/service hadoop-hdfs-namenode start'
> >> 
> >> err: 
> >>/Stage[main]/Hadoop_head_node/Hadoop::Namenode[namenode]/Service[hadoop-h
> >>dfs-namenode]/ensure: change from stopped to running failed: Could not
> >>start Service[hadoop-hdfs-namenode]: Execution of '/sbin/service
> >>hadoop-hdfs-namenode start' returned 3:  at
> >>/mnt/var/lib/bootstrap-actions/1/bigtop-0.8.0/bigtop-deploy/puppet/module
> >>s/hadoop/manifests/init.pp:335
> >> 
> >> debug: Service[hadoop-hdfs-namenode](provider=redhat): Executing
> >>'/sbin/service hadoop-hdfs-namenode status'
> >> 
> >> debug: 
> >>/Stage[main]/Hadoop_head_node/Hadoop::Namenode[namenode]/Service[hadoop-h
> >>dfs-namenode]: Skipping restart; service is not running
> >> 
> >> notice: 
> >>/Stage[main]/Hadoop_head_node/Hadoop::Namenode[namenode]/Service[hadoop-h
> >>dfs-namenode]: Triggered 'refresh' from 4 events
> >> 
> >> debug: 
> >>/Stage[main]/Hadoop_head_node/Hadoop::Namenode[namenode]/Service[hadoop-h
> >>dfs-namenode]: The container Hadoop::Namenode[namenode] will propagate
> >>my refresh event
> >> 
> >> debug: Hadoop::Namenode[namenode]: The container
> >>Class[Hadoop_head_node] will propagate my refresh event
> >> 
> >> notice: 
> >>/Stage[main]/Hadoop_worker_node/Hadoop::Datanode[datanode]/Package[hadoop
> >>-hdfs-datanode]: Dependency Service[hadoop-hdfs-namenode] has failures:
> >>true
> >> 
> >> warning: 
> >>/Stage[main]/Hadoop_worker_node/Hadoop::Datanode[datanode]/Package[hadoop
> >>-hdfs-datanode]: Skipping because of failed dependencies
> >> 
> >> notice: 
> >>/Stage[main]/Hadoop_worker_node/Hadoop::Datanode[datanode]/File[/mnt1/hdf
> >>s]: Dependency Service[hadoop-hdfs-namenode] has failures: true
> >> 
> >> warning: 
> >>/Stage[main]/Hadoop_worker_node/Hadoop::Datanode[datanode]/File[/mnt1/hdf
> >>s]: Skipping because of failed dependencies
> >> 
> >> notice: 
> >>/Stage[main]/Hadoop_worker_node/Hadoop::Datanode[datanode]/File[/mnt/hdfs
> >>]: Dependency Service[hadoop-hdfs-namenode] has failures: true
> >> 
> >> 
> >> The interesting part is that if I query namenode status eventually the
> >>namenode will show up as started even though I have not taken any other
> >>actions:
> >> 
> >> 
> >> [hadoop@ip-10-168-87-216 1]$ sudo /sbin/service hadoop-hdfs-namenode
> >>status
> >> 
> >> Hadoop namenode is not running                             [FAILED]
> >> 
> >> [hadoop@ip-10-168-87-216 1]$ sudo /sbin/service hadoop-hdfs-namenode
> >>status
> >> 
> >> Hadoop namenode is not running                             [FAILED]
> >> 
> >> [hadoop@ip-10-168-87-216 1]$ sudo /sbin/service hadoop-hdfs-namenode
> >>status
> >> 
> >> Hadoop namenode is running                                 [  OK  ]
> >> 
> >> The same goes for proxy server. The problem here is that all other
> >> dependencies of namenode do not install (such as resource manager,
> >>etc). I
> >> am using the latest release of AmazonLinux (2014.09) and this has puppet
> >> 2.7.25-1. I am not sure what to do about this issue, has anyone else
> >> experienced something like this? Should I just move to puppet 3.x and
> >>only
> >> try to install out of the Bigtop trunk (0.9.0)?
> >
> >ResourceManager isn't a dependency of namenode - it's the other way
> >around.
> >It's hard to say what's going on with your system without looking into
> >particular daemon logs. I'd suggest you check them and investigate what's
> >the
> >trouble is. 
> >
> >Also, there's a small issue BIGTOP-1522 with nodemanager recipes if you're
> >installing a custom set of components, which might or not affect you
> >
> >Cos
> >
> 

Re: Problem using puppet scripts to configure bigtop on AmazonLinux

Posted by Konstantin Boudnik <co...@apache.org>.
On Sat, Nov 29, 2014 at 01:43AM, Leidle, Rob wrote:
> Yes, I ran into Bigtop-1522 and figured out I needed to add mapred-app.
> Sorry, I wrote what I said in the previous email incorrectly, yes,
> resource manager does not install because the depdendency namenode does
> not install correctly. I will look more closely at the service logs to see
> if I can figure out why it isn╧t starting. The error code of Ё3╡ indicates
> from the /etc/init.d/hadoop-hdfs-namenode script that this means it can╧t
> find the running process 5 seconds after starting it.

Yes, please look into the logs - might be something obvious missed. We are
running these recipes for a good 3+ years and they are fairly well tested.
Would be good to fix last bugs if any ;)

Cos

> 
> On 11/28/14, 4:14 PM, "Konstantin Boudnik" <co...@apache.org> wrote:
> 
> >On Fri, Nov 28, 2014 at 08:06PM, Leidle, Rob wrote:
> >> Hello all, I am trying to configure & install Bigtop 0.8.0 using the
> >>puppet scripts on AmazonLinux on EC2. Thus far, almost everything has
> >>worked besides one minor change I have made to the site.pp manifest.
> >>However, I am running into a problem, it seems that the services such as
> >>proxy server or namenode are not immediately starting. You can see the
> >>error below in the purple text related to namenode.
> >> 
> >> 
> >> info: 
> >>/Stage[main]/Hadoop::Common-yarn/File[/etc/hadoop/conf/yarn-site.xml]:
> >>Scheduling refresh of Service[hadoop-yarn-resourcemanager]
> >> 
> >> info: 
> >>/Stage[main]/Hadoop::Common-yarn/File[/etc/hadoop/conf/yarn-site.xml]:
> >>Scheduling refresh of Service[hadoop-mapreduce-historyserver]
> >> 
> >> info: 
> >>/Stage[main]/Hadoop::Common-yarn/File[/etc/hadoop/conf/yarn-site.xml]:
> >>Scheduling refresh of Service[hadoop-yarn-nodemanager]
> >> 
> >> info: 
> >>/Stage[main]/Hadoop::Common-yarn/File[/etc/hadoop/conf/yarn-site.xml]:
> >>Scheduling refresh of Service[hadoop-yarn-proxyserver]
> >> 
> >> debug: 
> >>/Stage[main]/Hadoop::Common-yarn/File[/etc/hadoop/conf/yarn-site.xml]:
> >>The container Class[Hadoop::Common-yarn] will propagate my refresh event
> >> 
> >> debug: Service[hadoop-yarn-proxyserver](provider=redhat): Executing
> >>'/sbin/service hadoop-yarn-proxyserver status'
> >> 
> >> debug: Service[hadoop-yarn-proxyserver](provider=redhat): Executing
> >>'/sbin/service hadoop-yarn-proxyserver start'
> >> 
> >> err: 
> >>/Stage[main]/Hadoop_head_node/Hadoop::Proxyserver[proxyserver]/Service[ha
> >>doop-yarn-proxyserver]/ensure: change from stopped to running failed:
> >>Could not start Service[hadoop-yarn-proxyserver]: Execution of
> >>'/sbin/service hadoop-yarn-proxyserver start' returned 3:  at
> >>/mnt/var/lib/bootstrap-actions/1/bigtop-0.8.0/bigtop-deploy/puppet/module
> >>s/hadoop/manifests/init.pp:483
> >> 
> >> debug: Service[hadoop-yarn-proxyserver](provider=redhat): Executing
> >>'/sbin/service hadoop-yarn-proxyserver status'
> >> 
> >> debug: 
> >>/Stage[main]/Hadoop_head_node/Hadoop::Proxyserver[proxyserver]/Service[ha
> >>doop-yarn-proxyserver]: Skipping restart; service is not running
> >> 
> >> notice: 
> >>/Stage[main]/Hadoop_head_node/Hadoop::Proxyserver[proxyserver]/Service[ha
> >>doop-yarn-proxyserver]: Triggered 'refresh' from 4 events
> >> 
> >> debug: 
> >>/Stage[main]/Hadoop_head_node/Hadoop::Proxyserver[proxyserver]/Service[ha
> >>doop-yarn-proxyserver]: The container Hadoop::Proxyserver[proxyserver]
> >>will propagate my refresh event
> >> 
> >> debug: Hadoop::Proxyserver[proxyserver]: The container
> >>Class[Hadoop_head_node] will propagate my refresh event
> >> 
> >> debug: Class[Hadoop::Common-yarn]: The container Stage[main] will
> >>propagate my refresh event
> >> 
> >> debug: Service[hadoop-hdfs-namenode](provider=redhat): Executing
> >>'/sbin/service hadoop-hdfs-namenode status'
> >> 
> >> debug: Service[hadoop-hdfs-namenode](provider=redhat): Executing
> >>'/sbin/service hadoop-hdfs-namenode start'
> >> 
> >> err: 
> >>/Stage[main]/Hadoop_head_node/Hadoop::Namenode[namenode]/Service[hadoop-h
> >>dfs-namenode]/ensure: change from stopped to running failed: Could not
> >>start Service[hadoop-hdfs-namenode]: Execution of '/sbin/service
> >>hadoop-hdfs-namenode start' returned 3:  at
> >>/mnt/var/lib/bootstrap-actions/1/bigtop-0.8.0/bigtop-deploy/puppet/module
> >>s/hadoop/manifests/init.pp:335
> >> 
> >> debug: Service[hadoop-hdfs-namenode](provider=redhat): Executing
> >>'/sbin/service hadoop-hdfs-namenode status'
> >> 
> >> debug: 
> >>/Stage[main]/Hadoop_head_node/Hadoop::Namenode[namenode]/Service[hadoop-h
> >>dfs-namenode]: Skipping restart; service is not running
> >> 
> >> notice: 
> >>/Stage[main]/Hadoop_head_node/Hadoop::Namenode[namenode]/Service[hadoop-h
> >>dfs-namenode]: Triggered 'refresh' from 4 events
> >> 
> >> debug: 
> >>/Stage[main]/Hadoop_head_node/Hadoop::Namenode[namenode]/Service[hadoop-h
> >>dfs-namenode]: The container Hadoop::Namenode[namenode] will propagate
> >>my refresh event
> >> 
> >> debug: Hadoop::Namenode[namenode]: The container
> >>Class[Hadoop_head_node] will propagate my refresh event
> >> 
> >> notice: 
> >>/Stage[main]/Hadoop_worker_node/Hadoop::Datanode[datanode]/Package[hadoop
> >>-hdfs-datanode]: Dependency Service[hadoop-hdfs-namenode] has failures:
> >>true
> >> 
> >> warning: 
> >>/Stage[main]/Hadoop_worker_node/Hadoop::Datanode[datanode]/Package[hadoop
> >>-hdfs-datanode]: Skipping because of failed dependencies
> >> 
> >> notice: 
> >>/Stage[main]/Hadoop_worker_node/Hadoop::Datanode[datanode]/File[/mnt1/hdf
> >>s]: Dependency Service[hadoop-hdfs-namenode] has failures: true
> >> 
> >> warning: 
> >>/Stage[main]/Hadoop_worker_node/Hadoop::Datanode[datanode]/File[/mnt1/hdf
> >>s]: Skipping because of failed dependencies
> >> 
> >> notice: 
> >>/Stage[main]/Hadoop_worker_node/Hadoop::Datanode[datanode]/File[/mnt/hdfs
> >>]: Dependency Service[hadoop-hdfs-namenode] has failures: true
> >> 
> >> 
> >> The interesting part is that if I query namenode status eventually the
> >>namenode will show up as started even though I have not taken any other
> >>actions:
> >> 
> >> 
> >> [hadoop@ip-10-168-87-216 1]$ sudo /sbin/service hadoop-hdfs-namenode
> >>status
> >> 
> >> Hadoop namenode is not running                             [FAILED]
> >> 
> >> [hadoop@ip-10-168-87-216 1]$ sudo /sbin/service hadoop-hdfs-namenode
> >>status
> >> 
> >> Hadoop namenode is not running                             [FAILED]
> >> 
> >> [hadoop@ip-10-168-87-216 1]$ sudo /sbin/service hadoop-hdfs-namenode
> >>status
> >> 
> >> Hadoop namenode is running                                 [  OK  ]
> >> 
> >> The same goes for proxy server. The problem here is that all other
> >> dependencies of namenode do not install (such as resource manager,
> >>etc). I
> >> am using the latest release of AmazonLinux (2014.09) and this has puppet
> >> 2.7.25-1. I am not sure what to do about this issue, has anyone else
> >> experienced something like this? Should I just move to puppet 3.x and
> >>only
> >> try to install out of the Bigtop trunk (0.9.0)?
> >
> >ResourceManager isn't a dependency of namenode - it's the other way
> >around.
> >It's hard to say what's going on with your system without looking into
> >particular daemon logs. I'd suggest you check them and investigate what's
> >the
> >trouble is. 
> >
> >Also, there's a small issue BIGTOP-1522 with nodemanager recipes if you're
> >installing a custom set of components, which might or not affect you
> >
> >Cos
> >
> 

Re: Problem using puppet scripts to configure bigtop on AmazonLinux

Posted by "Leidle, Rob" <le...@amazon.com>.
Yes, I ran into Bigtop-1522 and figured out I needed to add mapred-app.
Sorry, I wrote what I said in the previous email incorrectly, yes,
resource manager does not install because the depdendency namenode does
not install correctly. I will look more closely at the service logs to see
if I can figure out why it isn¹t starting. The error code of ³3² indicates
from the /etc/init.d/hadoop-hdfs-namenode script that this means it can¹t
find the running process 5 seconds after starting it.

On 11/28/14, 4:14 PM, "Konstantin Boudnik" <co...@apache.org> wrote:

>On Fri, Nov 28, 2014 at 08:06PM, Leidle, Rob wrote:
>> Hello all, I am trying to configure & install Bigtop 0.8.0 using the
>>puppet scripts on AmazonLinux on EC2. Thus far, almost everything has
>>worked besides one minor change I have made to the site.pp manifest.
>>However, I am running into a problem, it seems that the services such as
>>proxy server or namenode are not immediately starting. You can see the
>>error below in the purple text related to namenode.
>> 
>> 
>> info: 
>>/Stage[main]/Hadoop::Common-yarn/File[/etc/hadoop/conf/yarn-site.xml]:
>>Scheduling refresh of Service[hadoop-yarn-resourcemanager]
>> 
>> info: 
>>/Stage[main]/Hadoop::Common-yarn/File[/etc/hadoop/conf/yarn-site.xml]:
>>Scheduling refresh of Service[hadoop-mapreduce-historyserver]
>> 
>> info: 
>>/Stage[main]/Hadoop::Common-yarn/File[/etc/hadoop/conf/yarn-site.xml]:
>>Scheduling refresh of Service[hadoop-yarn-nodemanager]
>> 
>> info: 
>>/Stage[main]/Hadoop::Common-yarn/File[/etc/hadoop/conf/yarn-site.xml]:
>>Scheduling refresh of Service[hadoop-yarn-proxyserver]
>> 
>> debug: 
>>/Stage[main]/Hadoop::Common-yarn/File[/etc/hadoop/conf/yarn-site.xml]:
>>The container Class[Hadoop::Common-yarn] will propagate my refresh event
>> 
>> debug: Service[hadoop-yarn-proxyserver](provider=redhat): Executing
>>'/sbin/service hadoop-yarn-proxyserver status'
>> 
>> debug: Service[hadoop-yarn-proxyserver](provider=redhat): Executing
>>'/sbin/service hadoop-yarn-proxyserver start'
>> 
>> err: 
>>/Stage[main]/Hadoop_head_node/Hadoop::Proxyserver[proxyserver]/Service[ha
>>doop-yarn-proxyserver]/ensure: change from stopped to running failed:
>>Could not start Service[hadoop-yarn-proxyserver]: Execution of
>>'/sbin/service hadoop-yarn-proxyserver start' returned 3:  at
>>/mnt/var/lib/bootstrap-actions/1/bigtop-0.8.0/bigtop-deploy/puppet/module
>>s/hadoop/manifests/init.pp:483
>> 
>> debug: Service[hadoop-yarn-proxyserver](provider=redhat): Executing
>>'/sbin/service hadoop-yarn-proxyserver status'
>> 
>> debug: 
>>/Stage[main]/Hadoop_head_node/Hadoop::Proxyserver[proxyserver]/Service[ha
>>doop-yarn-proxyserver]: Skipping restart; service is not running
>> 
>> notice: 
>>/Stage[main]/Hadoop_head_node/Hadoop::Proxyserver[proxyserver]/Service[ha
>>doop-yarn-proxyserver]: Triggered 'refresh' from 4 events
>> 
>> debug: 
>>/Stage[main]/Hadoop_head_node/Hadoop::Proxyserver[proxyserver]/Service[ha
>>doop-yarn-proxyserver]: The container Hadoop::Proxyserver[proxyserver]
>>will propagate my refresh event
>> 
>> debug: Hadoop::Proxyserver[proxyserver]: The container
>>Class[Hadoop_head_node] will propagate my refresh event
>> 
>> debug: Class[Hadoop::Common-yarn]: The container Stage[main] will
>>propagate my refresh event
>> 
>> debug: Service[hadoop-hdfs-namenode](provider=redhat): Executing
>>'/sbin/service hadoop-hdfs-namenode status'
>> 
>> debug: Service[hadoop-hdfs-namenode](provider=redhat): Executing
>>'/sbin/service hadoop-hdfs-namenode start'
>> 
>> err: 
>>/Stage[main]/Hadoop_head_node/Hadoop::Namenode[namenode]/Service[hadoop-h
>>dfs-namenode]/ensure: change from stopped to running failed: Could not
>>start Service[hadoop-hdfs-namenode]: Execution of '/sbin/service
>>hadoop-hdfs-namenode start' returned 3:  at
>>/mnt/var/lib/bootstrap-actions/1/bigtop-0.8.0/bigtop-deploy/puppet/module
>>s/hadoop/manifests/init.pp:335
>> 
>> debug: Service[hadoop-hdfs-namenode](provider=redhat): Executing
>>'/sbin/service hadoop-hdfs-namenode status'
>> 
>> debug: 
>>/Stage[main]/Hadoop_head_node/Hadoop::Namenode[namenode]/Service[hadoop-h
>>dfs-namenode]: Skipping restart; service is not running
>> 
>> notice: 
>>/Stage[main]/Hadoop_head_node/Hadoop::Namenode[namenode]/Service[hadoop-h
>>dfs-namenode]: Triggered 'refresh' from 4 events
>> 
>> debug: 
>>/Stage[main]/Hadoop_head_node/Hadoop::Namenode[namenode]/Service[hadoop-h
>>dfs-namenode]: The container Hadoop::Namenode[namenode] will propagate
>>my refresh event
>> 
>> debug: Hadoop::Namenode[namenode]: The container
>>Class[Hadoop_head_node] will propagate my refresh event
>> 
>> notice: 
>>/Stage[main]/Hadoop_worker_node/Hadoop::Datanode[datanode]/Package[hadoop
>>-hdfs-datanode]: Dependency Service[hadoop-hdfs-namenode] has failures:
>>true
>> 
>> warning: 
>>/Stage[main]/Hadoop_worker_node/Hadoop::Datanode[datanode]/Package[hadoop
>>-hdfs-datanode]: Skipping because of failed dependencies
>> 
>> notice: 
>>/Stage[main]/Hadoop_worker_node/Hadoop::Datanode[datanode]/File[/mnt1/hdf
>>s]: Dependency Service[hadoop-hdfs-namenode] has failures: true
>> 
>> warning: 
>>/Stage[main]/Hadoop_worker_node/Hadoop::Datanode[datanode]/File[/mnt1/hdf
>>s]: Skipping because of failed dependencies
>> 
>> notice: 
>>/Stage[main]/Hadoop_worker_node/Hadoop::Datanode[datanode]/File[/mnt/hdfs
>>]: Dependency Service[hadoop-hdfs-namenode] has failures: true
>> 
>> 
>> The interesting part is that if I query namenode status eventually the
>>namenode will show up as started even though I have not taken any other
>>actions:
>> 
>> 
>> [hadoop@ip-10-168-87-216 1]$ sudo /sbin/service hadoop-hdfs-namenode
>>status
>> 
>> Hadoop namenode is not running                             [FAILED]
>> 
>> [hadoop@ip-10-168-87-216 1]$ sudo /sbin/service hadoop-hdfs-namenode
>>status
>> 
>> Hadoop namenode is not running                             [FAILED]
>> 
>> [hadoop@ip-10-168-87-216 1]$ sudo /sbin/service hadoop-hdfs-namenode
>>status
>> 
>> Hadoop namenode is running                                 [  OK  ]
>> 
>> The same goes for proxy server. The problem here is that all other
>> dependencies of namenode do not install (such as resource manager,
>>etc). I
>> am using the latest release of AmazonLinux (2014.09) and this has puppet
>> 2.7.25-1. I am not sure what to do about this issue, has anyone else
>> experienced something like this? Should I just move to puppet 3.x and
>>only
>> try to install out of the Bigtop trunk (0.9.0)?
>
>ResourceManager isn't a dependency of namenode - it's the other way
>around.
>It's hard to say what's going on with your system without looking into
>particular daemon logs. I'd suggest you check them and investigate what's
>the
>trouble is. 
>
>Also, there's a small issue BIGTOP-1522 with nodemanager recipes if you're
>installing a custom set of components, which might or not affect you
>
>Cos
>


Re: Problem using puppet scripts to configure bigtop on AmazonLinux

Posted by Konstantin Boudnik <co...@apache.org>.
On Fri, Nov 28, 2014 at 08:06PM, Leidle, Rob wrote:
> Hello all, I am trying to configure & install Bigtop 0.8.0 using the puppet scripts on AmazonLinux on EC2. Thus far, almost everything has worked besides one minor change I have made to the site.pp manifest. However, I am running into a problem, it seems that the services such as proxy server or namenode are not immediately starting. You can see the error below in the purple text related to namenode.
> 
> 
> info: /Stage[main]/Hadoop::Common-yarn/File[/etc/hadoop/conf/yarn-site.xml]: Scheduling refresh of Service[hadoop-yarn-resourcemanager]
> 
> info: /Stage[main]/Hadoop::Common-yarn/File[/etc/hadoop/conf/yarn-site.xml]: Scheduling refresh of Service[hadoop-mapreduce-historyserver]
> 
> info: /Stage[main]/Hadoop::Common-yarn/File[/etc/hadoop/conf/yarn-site.xml]: Scheduling refresh of Service[hadoop-yarn-nodemanager]
> 
> info: /Stage[main]/Hadoop::Common-yarn/File[/etc/hadoop/conf/yarn-site.xml]: Scheduling refresh of Service[hadoop-yarn-proxyserver]
> 
> debug: /Stage[main]/Hadoop::Common-yarn/File[/etc/hadoop/conf/yarn-site.xml]: The container Class[Hadoop::Common-yarn] will propagate my refresh event
> 
> debug: Service[hadoop-yarn-proxyserver](provider=redhat): Executing '/sbin/service hadoop-yarn-proxyserver status'
> 
> debug: Service[hadoop-yarn-proxyserver](provider=redhat): Executing '/sbin/service hadoop-yarn-proxyserver start'
> 
> err: /Stage[main]/Hadoop_head_node/Hadoop::Proxyserver[proxyserver]/Service[hadoop-yarn-proxyserver]/ensure: change from stopped to running failed: Could not start Service[hadoop-yarn-proxyserver]: Execution of '/sbin/service hadoop-yarn-proxyserver start' returned 3:  at /mnt/var/lib/bootstrap-actions/1/bigtop-0.8.0/bigtop-deploy/puppet/modules/hadoop/manifests/init.pp:483
> 
> debug: Service[hadoop-yarn-proxyserver](provider=redhat): Executing '/sbin/service hadoop-yarn-proxyserver status'
> 
> debug: /Stage[main]/Hadoop_head_node/Hadoop::Proxyserver[proxyserver]/Service[hadoop-yarn-proxyserver]: Skipping restart; service is not running
> 
> notice: /Stage[main]/Hadoop_head_node/Hadoop::Proxyserver[proxyserver]/Service[hadoop-yarn-proxyserver]: Triggered 'refresh' from 4 events
> 
> debug: /Stage[main]/Hadoop_head_node/Hadoop::Proxyserver[proxyserver]/Service[hadoop-yarn-proxyserver]: The container Hadoop::Proxyserver[proxyserver] will propagate my refresh event
> 
> debug: Hadoop::Proxyserver[proxyserver]: The container Class[Hadoop_head_node] will propagate my refresh event
> 
> debug: Class[Hadoop::Common-yarn]: The container Stage[main] will propagate my refresh event
> 
> debug: Service[hadoop-hdfs-namenode](provider=redhat): Executing '/sbin/service hadoop-hdfs-namenode status'
> 
> debug: Service[hadoop-hdfs-namenode](provider=redhat): Executing '/sbin/service hadoop-hdfs-namenode start'
> 
> err: /Stage[main]/Hadoop_head_node/Hadoop::Namenode[namenode]/Service[hadoop-hdfs-namenode]/ensure: change from stopped to running failed: Could not start Service[hadoop-hdfs-namenode]: Execution of '/sbin/service hadoop-hdfs-namenode start' returned 3:  at /mnt/var/lib/bootstrap-actions/1/bigtop-0.8.0/bigtop-deploy/puppet/modules/hadoop/manifests/init.pp:335
> 
> debug: Service[hadoop-hdfs-namenode](provider=redhat): Executing '/sbin/service hadoop-hdfs-namenode status'
> 
> debug: /Stage[main]/Hadoop_head_node/Hadoop::Namenode[namenode]/Service[hadoop-hdfs-namenode]: Skipping restart; service is not running
> 
> notice: /Stage[main]/Hadoop_head_node/Hadoop::Namenode[namenode]/Service[hadoop-hdfs-namenode]: Triggered 'refresh' from 4 events
> 
> debug: /Stage[main]/Hadoop_head_node/Hadoop::Namenode[namenode]/Service[hadoop-hdfs-namenode]: The container Hadoop::Namenode[namenode] will propagate my refresh event
> 
> debug: Hadoop::Namenode[namenode]: The container Class[Hadoop_head_node] will propagate my refresh event
> 
> notice: /Stage[main]/Hadoop_worker_node/Hadoop::Datanode[datanode]/Package[hadoop-hdfs-datanode]: Dependency Service[hadoop-hdfs-namenode] has failures: true
> 
> warning: /Stage[main]/Hadoop_worker_node/Hadoop::Datanode[datanode]/Package[hadoop-hdfs-datanode]: Skipping because of failed dependencies
> 
> notice: /Stage[main]/Hadoop_worker_node/Hadoop::Datanode[datanode]/File[/mnt1/hdfs]: Dependency Service[hadoop-hdfs-namenode] has failures: true
> 
> warning: /Stage[main]/Hadoop_worker_node/Hadoop::Datanode[datanode]/File[/mnt1/hdfs]: Skipping because of failed dependencies
> 
> notice: /Stage[main]/Hadoop_worker_node/Hadoop::Datanode[datanode]/File[/mnt/hdfs]: Dependency Service[hadoop-hdfs-namenode] has failures: true
> 
> 
> The interesting part is that if I query namenode status eventually the namenode will show up as started even though I have not taken any other actions:
> 
> 
> [hadoop@ip-10-168-87-216 1]$ sudo /sbin/service hadoop-hdfs-namenode status
> 
> Hadoop namenode is not running                             [FAILED]
> 
> [hadoop@ip-10-168-87-216 1]$ sudo /sbin/service hadoop-hdfs-namenode status
> 
> Hadoop namenode is not running                             [FAILED]
> 
> [hadoop@ip-10-168-87-216 1]$ sudo /sbin/service hadoop-hdfs-namenode status
> 
> Hadoop namenode is running                                 [  OK  ]
> 
> The same goes for proxy server. The problem here is that all other
> dependencies of namenode do not install (such as resource manager, etc). I
> am using the latest release of AmazonLinux (2014.09) and this has puppet
> 2.7.25-1. I am not sure what to do about this issue, has anyone else
> experienced something like this? Should I just move to puppet 3.x and only
> try to install out of the Bigtop trunk (0.9.0)?

ResourceManager isn't a dependency of namenode - it's the other way around.
It's hard to say what's going on with your system without looking into
particular daemon logs. I'd suggest you check them and investigate what's the
trouble is. 

Also, there's a small issue BIGTOP-1522 with nodemanager recipes if you're
installing a custom set of components, which might or not affect you 

Cos


Re: Problem using puppet scripts to configure bigtop on AmazonLinux

Posted by Konstantin Boudnik <co...@apache.org>.
On Fri, Nov 28, 2014 at 08:06PM, Leidle, Rob wrote:
> Hello all, I am trying to configure & install Bigtop 0.8.0 using the puppet scripts on AmazonLinux on EC2. Thus far, almost everything has worked besides one minor change I have made to the site.pp manifest. However, I am running into a problem, it seems that the services such as proxy server or namenode are not immediately starting. You can see the error below in the purple text related to namenode.
> 
> 
> info: /Stage[main]/Hadoop::Common-yarn/File[/etc/hadoop/conf/yarn-site.xml]: Scheduling refresh of Service[hadoop-yarn-resourcemanager]
> 
> info: /Stage[main]/Hadoop::Common-yarn/File[/etc/hadoop/conf/yarn-site.xml]: Scheduling refresh of Service[hadoop-mapreduce-historyserver]
> 
> info: /Stage[main]/Hadoop::Common-yarn/File[/etc/hadoop/conf/yarn-site.xml]: Scheduling refresh of Service[hadoop-yarn-nodemanager]
> 
> info: /Stage[main]/Hadoop::Common-yarn/File[/etc/hadoop/conf/yarn-site.xml]: Scheduling refresh of Service[hadoop-yarn-proxyserver]
> 
> debug: /Stage[main]/Hadoop::Common-yarn/File[/etc/hadoop/conf/yarn-site.xml]: The container Class[Hadoop::Common-yarn] will propagate my refresh event
> 
> debug: Service[hadoop-yarn-proxyserver](provider=redhat): Executing '/sbin/service hadoop-yarn-proxyserver status'
> 
> debug: Service[hadoop-yarn-proxyserver](provider=redhat): Executing '/sbin/service hadoop-yarn-proxyserver start'
> 
> err: /Stage[main]/Hadoop_head_node/Hadoop::Proxyserver[proxyserver]/Service[hadoop-yarn-proxyserver]/ensure: change from stopped to running failed: Could not start Service[hadoop-yarn-proxyserver]: Execution of '/sbin/service hadoop-yarn-proxyserver start' returned 3:  at /mnt/var/lib/bootstrap-actions/1/bigtop-0.8.0/bigtop-deploy/puppet/modules/hadoop/manifests/init.pp:483
> 
> debug: Service[hadoop-yarn-proxyserver](provider=redhat): Executing '/sbin/service hadoop-yarn-proxyserver status'
> 
> debug: /Stage[main]/Hadoop_head_node/Hadoop::Proxyserver[proxyserver]/Service[hadoop-yarn-proxyserver]: Skipping restart; service is not running
> 
> notice: /Stage[main]/Hadoop_head_node/Hadoop::Proxyserver[proxyserver]/Service[hadoop-yarn-proxyserver]: Triggered 'refresh' from 4 events
> 
> debug: /Stage[main]/Hadoop_head_node/Hadoop::Proxyserver[proxyserver]/Service[hadoop-yarn-proxyserver]: The container Hadoop::Proxyserver[proxyserver] will propagate my refresh event
> 
> debug: Hadoop::Proxyserver[proxyserver]: The container Class[Hadoop_head_node] will propagate my refresh event
> 
> debug: Class[Hadoop::Common-yarn]: The container Stage[main] will propagate my refresh event
> 
> debug: Service[hadoop-hdfs-namenode](provider=redhat): Executing '/sbin/service hadoop-hdfs-namenode status'
> 
> debug: Service[hadoop-hdfs-namenode](provider=redhat): Executing '/sbin/service hadoop-hdfs-namenode start'
> 
> err: /Stage[main]/Hadoop_head_node/Hadoop::Namenode[namenode]/Service[hadoop-hdfs-namenode]/ensure: change from stopped to running failed: Could not start Service[hadoop-hdfs-namenode]: Execution of '/sbin/service hadoop-hdfs-namenode start' returned 3:  at /mnt/var/lib/bootstrap-actions/1/bigtop-0.8.0/bigtop-deploy/puppet/modules/hadoop/manifests/init.pp:335
> 
> debug: Service[hadoop-hdfs-namenode](provider=redhat): Executing '/sbin/service hadoop-hdfs-namenode status'
> 
> debug: /Stage[main]/Hadoop_head_node/Hadoop::Namenode[namenode]/Service[hadoop-hdfs-namenode]: Skipping restart; service is not running
> 
> notice: /Stage[main]/Hadoop_head_node/Hadoop::Namenode[namenode]/Service[hadoop-hdfs-namenode]: Triggered 'refresh' from 4 events
> 
> debug: /Stage[main]/Hadoop_head_node/Hadoop::Namenode[namenode]/Service[hadoop-hdfs-namenode]: The container Hadoop::Namenode[namenode] will propagate my refresh event
> 
> debug: Hadoop::Namenode[namenode]: The container Class[Hadoop_head_node] will propagate my refresh event
> 
> notice: /Stage[main]/Hadoop_worker_node/Hadoop::Datanode[datanode]/Package[hadoop-hdfs-datanode]: Dependency Service[hadoop-hdfs-namenode] has failures: true
> 
> warning: /Stage[main]/Hadoop_worker_node/Hadoop::Datanode[datanode]/Package[hadoop-hdfs-datanode]: Skipping because of failed dependencies
> 
> notice: /Stage[main]/Hadoop_worker_node/Hadoop::Datanode[datanode]/File[/mnt1/hdfs]: Dependency Service[hadoop-hdfs-namenode] has failures: true
> 
> warning: /Stage[main]/Hadoop_worker_node/Hadoop::Datanode[datanode]/File[/mnt1/hdfs]: Skipping because of failed dependencies
> 
> notice: /Stage[main]/Hadoop_worker_node/Hadoop::Datanode[datanode]/File[/mnt/hdfs]: Dependency Service[hadoop-hdfs-namenode] has failures: true
> 
> 
> The interesting part is that if I query namenode status eventually the namenode will show up as started even though I have not taken any other actions:
> 
> 
> [hadoop@ip-10-168-87-216 1]$ sudo /sbin/service hadoop-hdfs-namenode status
> 
> Hadoop namenode is not running                             [FAILED]
> 
> [hadoop@ip-10-168-87-216 1]$ sudo /sbin/service hadoop-hdfs-namenode status
> 
> Hadoop namenode is not running                             [FAILED]
> 
> [hadoop@ip-10-168-87-216 1]$ sudo /sbin/service hadoop-hdfs-namenode status
> 
> Hadoop namenode is running                                 [  OK  ]
> 
> The same goes for proxy server. The problem here is that all other
> dependencies of namenode do not install (such as resource manager, etc). I
> am using the latest release of AmazonLinux (2014.09) and this has puppet
> 2.7.25-1. I am not sure what to do about this issue, has anyone else
> experienced something like this? Should I just move to puppet 3.x and only
> try to install out of the Bigtop trunk (0.9.0)?

ResourceManager isn't a dependency of namenode - it's the other way around.
It's hard to say what's going on with your system without looking into
particular daemon logs. I'd suggest you check them and investigate what's the
trouble is. 

Also, there's a small issue BIGTOP-1522 with nodemanager recipes if you're
installing a custom set of components, which might or not affect you 

Cos