You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@bigtop.apache.org by Konstantin Boudnik <co...@apache.org> on 2014/12/11 01:07:46 UTC

Re: Problem using puppet scripts to configure bigtop on AmazonLinux

Rob,

following on our IRC chat I will Cc here two guys from the community who know
Puppet the best. Nate and Rich are likely to have the answer. Guys, if you can
chime in on the topic - it'd be great!

To reiterate it: you are looking to a way to automatically tell if a recipe
has failed and repeat it, if required, right?

On Sun, Nov 30, 2014 at 09:50PM, Leidle, Rob wrote:
> Thanks Cos,
> 
> This would be something that I would want to automate as it would be
> running many times across many different clusters. Ideally I would fix any
> issues causing the puppet scripts to not complete properly, but I don╧t
> know how realistic that is in the short term so I would like to setup
> retry logic if that is the recommended way of doing things. That╧s why I
> was hoping for some direction on how often to run the retry.
> 
> On 11/29/14, 5:12 PM, "Konstantin Boudnik" <co...@apache.org> wrote:
> 
> >On Sun, Nov 30, 2014 at 12:50AM, Leidle, Rob wrote:
> >> Thanks Roman,
> >> 
> >> I actually fixed the problem. I had an existing process monitoring the
> >> daemon and restarting it if it terminated. However, puppet encapsulates
> >>this
> >> so it is no longer needed. Also, this process was causing the namenode
> >> service to terminate once. I removed my existing monitoring process and
> >> everything is working fine.
> >> 
> >> That being said is there a recommended number of times we should retry
> >>the
> >> puppet scripts on failure?
> >
> >Good to see you're coming through! As for the retries: if something
> >doesn't
> >work I usually check the logs immediatelly. Sometimes after a second
> >re-run.
> >
> >Cos
> >
>

Re: Problem using puppet scripts to configure bigtop on AmazonLinux

Posted by ri...@reactor8.com.

Rob,Puppet itself does not provide for this type of capability, but
like most other config management solutions can be used to install and
configure packages that do. So if service has a configuration topology
 that handles some high availability mode, Puppet can configure this.
Similarly, as an example, if you wanted to use a process manager
solution like monit, you can write or leverage Puppet modules that
configure this to manage and monitor the daemons you wanted to better
protect. 
A general way to describe what most configuration management systems
do with respect to to high availability is that they are not involved
in a loop of  detecting errors and events and responding with
configuration changes although some systems are starting to tackle
things like "configuration triggers where configuration changes can be
triggered based on detected events". My view however in most cases
much better to do this using the underlying service's mechanism,
process management solutions or other infrastructure focused on high
availability if available.
-Rich

----- Original Message -----
From: "Leidle Rob" 
To:"dev@bigtop.apache.org" , "Konstantin Boudnik" ,
"user@bigtop.apache.org" 
Cc:"Rich" 
Sent:Thu, 11 Dec 2014 17:37:41 +0000
Subject:Re: Problem using puppet scripts to configure bigtop on
AmazonLinux

 Thanks Nate, this is exactly what I was looking for. One more
question — 
 does puppet have any mechanism for monitoring service daemons and 
 restarting them in the case where they have a catastrophic
failure/crash? 
 How do others in the Bigtop world deal with high availability and
ensuring 
 that processes are restarted when they inappropriately terminate?
Does 
 anyone have this kind of need?

 On 12/11/14, 12:26 AM, "Nate D'Amico"  wrote:

 >Guess breaking into two items:
 >
 >-detecting a failed puppet run when triggered via script/external
apply
 >-how many times to retry
 >
 >For the former, you could try to use " --detailed-exitcodes" which
should 
 >force a non-zero exit code, your script could detect that and act 
 >accordingly. Remember seeing a bug while back mentioned that you
needed 
 >to assert that param on apply to force puppet to return non-zero on 
 >error. Not sure if still exists, or what version you are running but

 >safe to probably try.
 >
 >As far as number of retries, all apps/services/etc could be
different.., 
 >only specific point of view I would say is given the puppet apply
has all 
 >data/attributes it needs to successfully converge, after two failed 
 >attempts you can safely assume failed, and then resort to log check
to 
 >see what issue could be.
 >
 >One other aspect to consider is that the puppet converge could
succeed 
 >but something outside causes a failure right after. Depending on 
 >resiliency you would want your process/other monitor to assert after
a 
 >successful run, and restart the whole converge run again.., or just 
 >notify, or etc.
 >
 >Does that help?
 >
 >
 >-----Original Message-----
 >From: Konstantin Boudnik [mailto:cos@apache.org] 
 >Sent: Wednesday, December 10, 2014 4:08 PM
 >To: user@bigtop.apache.org
 >Cc: dev@bigtop.apache.org; Nate D'Amico; Rich
 >Subject: Re: Problem using puppet scripts to configure bigtop on 
 >AmazonLinux
 >
 >Rob,
 >
 >following on our IRC chat I will Cc here two guys from the community
who 
 >know Puppet the best. Nate and Rich are likely to have the answer.
Guys, 
 >if you can chime in on the topic - it'd be great!
 >
 >To reiterate it: you are looking to a way to automatically tell if a

 >recipe has failed and repeat it, if required, right?
 >
 >On Sun, Nov 30, 2014 at 09:50PM, Leidle, Rob wrote:
 >> Thanks Cos,
 >> 
 >> This would be something that I would want to automate as it would
be 
 >> running many times across many different clusters. Ideally I would
fix 
 >> any issues causing the puppet scripts to not complete properly,
but I 
 >> don╧t know how realistic that is in the short term so I would
like to 
 >> setup retry logic if that is the recommended way of doing things. 
 >> That╧s why I was hoping for some direction on how often to run
the 
 >>retry.
 >> 
 >> On 11/29/14, 5:12 PM, "Konstantin Boudnik"  wrote:
 >> 
 >> >On Sun, Nov 30, 2014 at 12:50AM, Leidle, Rob wrote:
 >> >> Thanks Roman,
 >> >> 
 >> >> I actually fixed the problem. I had an existing process
monitoring 
 >> >>the daemon and restarting it if it terminated. However, puppet 
 >> >>encapsulates this so it is no longer needed. Also, this process
was 
 >> >>causing the namenode service to terminate once. I removed my 
 >> >>existing monitoring process and everything is working fine.
 >> >> 
 >> >> That being said is there a recommended number of times we
should 
 >> >>retry the puppet scripts on failure?
 >> >
 >> >Good to see you're coming through! As for the retries: if
something 
 >> >doesn't work I usually check the logs immediatelly. Sometimes
after a 
 >> >second re-run.
 >> >
 >> >Cos
 >> >
 >> 
 >

Re: Problem using puppet scripts to configure bigtop on AmazonLinux

Posted by "Leidle, Rob" <le...@amazon.com>.

Thanks Nate, this is exactly what I was looking for. One more question — 
does puppet have any mechanism for monitoring service daemons and 
restarting them in the case where they have a catastrophic failure/crash? 
How do others in the Bigtop world deal with high availability and ensuring 
that processes are restarted when they inappropriately terminate? Does 
anyone have this kind of need?




On 12/11/14, 12:26 AM, "Nate D'Amico" <na...@reactor8.com> wrote:

>Guess breaking into two items:
>
>-detecting a failed puppet run when triggered via script/external apply
>-how many times to retry
>
>For the former, you could try to use " --detailed-exitcodes" which should 
>force a non-zero exit code, your script could detect that and act 
>accordingly.  Remember seeing a bug while back mentioned that you needed 
>to assert that param on apply to force puppet to return non-zero on 
>error.  Not sure if still exists, or what version you are running but 
>safe to probably try.
>
>As far as number of retries, all apps/services/etc could be different.., 
>only specific point of view I would say is given the puppet apply has all 
>data/attributes it needs to successfully converge, after two failed 
>attempts you can safely assume failed, and then resort to log check to 
>see what issue could be.
>
>One other aspect to consider is that the puppet converge could succeed 
>but something outside causes a failure right after.  Depending on 
>resiliency you would want your process/other monitor to assert after a 
>successful run, and restart the whole converge run again.., or just 
>notify, or etc.
>
>Does that help?
>
>
>-----Original Message-----
>From: Konstantin Boudnik [mailto:cos@apache.org] 
>Sent: Wednesday, December 10, 2014 4:08 PM
>To: user@bigtop.apache.org
>Cc: dev@bigtop.apache.org; Nate D'Amico; Rich
>Subject: Re: Problem using puppet scripts to configure bigtop on 
>AmazonLinux
>
>Rob,
>
>following on our IRC chat I will Cc here two guys from the community who 
>know Puppet the best. Nate and Rich are likely to have the answer. Guys, 
>if you can chime in on the topic - it'd be great!
>
>To reiterate it: you are looking to a way to automatically tell if a 
>recipe has failed and repeat it, if required, right?
>
>On Sun, Nov 30, 2014 at 09:50PM, Leidle, Rob wrote:
>> Thanks Cos,
>> 
>> This would be something that I would want to automate as it would be 
>> running many times across many different clusters. Ideally I would fix 
>> any issues causing the puppet scripts to not complete properly, but I 
>> don╧t know how realistic that is in the short term so I would like to 
>> setup retry logic if that is the recommended way of doing things. 
>> That╧s why I was hoping for some direction on how often to run the 
>>retry.
>> 
>> On 11/29/14, 5:12 PM, "Konstantin Boudnik" <co...@apache.org> wrote:
>> 
>> >On Sun, Nov 30, 2014 at 12:50AM, Leidle, Rob wrote:
>> >> Thanks Roman,
>> >> 
>> >> I actually fixed the problem. I had an existing process monitoring 
>> >>the  daemon and restarting it if it terminated. However, puppet 
>> >>encapsulates this  so it is no longer needed. Also, this process was 
>> >>causing the namenode  service to terminate once. I removed my 
>> >>existing monitoring process and  everything is working fine.
>> >> 
>> >> That being said is there a recommended number of times we should 
>> >>retry the  puppet scripts on failure?
>> >
>> >Good to see you're coming through! As for the retries: if something 
>> >doesn't work I usually check the logs immediatelly. Sometimes after a 
>> >second re-run.
>> >
>> >Cos
>> >
>> 
>

RE: Problem using puppet scripts to configure bigtop on AmazonLinux

Posted by Nate D'Amico <na...@reactor8.com>.

Guess breaking into two items:

-detecting a failed puppet run when triggered via script/external apply
-how many times to retry

For the former, you could try to use " --detailed-exitcodes" which should force a non-zero exit code, your script could detect that and act accordingly.  Remember seeing a bug while back mentioned that you needed to assert that param on apply to force puppet to return non-zero on error.  Not sure if still exists, or what version you are running but safe to probably try.

As far as number of retries, all apps/services/etc could be different.., only specific point of view I would say is given the puppet apply has all data/attributes it needs to successfully converge, after two failed attempts you can safely assume failed, and then resort to log check to see what issue could be.

One other aspect to consider is that the puppet converge could succeed but something outside causes a failure right after.  Depending on resiliency you would want your process/other monitor to assert after a successful run, and restart the whole converge run again.., or just notify, or etc.

Does that help?

-----Original Message-----
From: Konstantin Boudnik [mailto:cos@apache.org] 
Sent: Wednesday, December 10, 2014 4:08 PM
To: user@bigtop.apache.org
Cc: dev@bigtop.apache.org; Nate D'Amico; Rich
Subject: Re: Problem using puppet scripts to configure bigtop on AmazonLinux

Rob,

following on our IRC chat I will Cc here two guys from the community who know Puppet the best. Nate and Rich are likely to have the answer. Guys, if you can chime in on the topic - it'd be great!

To reiterate it: you are looking to a way to automatically tell if a recipe has failed and repeat it, if required, right?

On Sun, Nov 30, 2014 at 09:50PM, Leidle, Rob wrote:
> Thanks Cos,
> 
> This would be something that I would want to automate as it would be 
> running many times across many different clusters. Ideally I would fix 
> any issues causing the puppet scripts to not complete properly, but I 
> don╧t know how realistic that is in the short term so I would like to 
> setup retry logic if that is the recommended way of doing things. 
> That╧s why I was hoping for some direction on how often to run the retry.
> 
> On 11/29/14, 5:12 PM, "Konstantin Boudnik" <co...@apache.org> wrote:
> 
> >On Sun, Nov 30, 2014 at 12:50AM, Leidle, Rob wrote:
> >> Thanks Roman,
> >> 
> >> I actually fixed the problem. I had an existing process monitoring 
> >>the  daemon and restarting it if it terminated. However, puppet 
> >>encapsulates this  so it is no longer needed. Also, this process was 
> >>causing the namenode  service to terminate once. I removed my 
> >>existing monitoring process and  everything is working fine.
> >> 
> >> That being said is there a recommended number of times we should 
> >>retry the  puppet scripts on failure?
> >
> >Good to see you're coming through! As for the retries: if something 
> >doesn't work I usually check the logs immediatelly. Sometimes after a 
> >second re-run.
> >
> >Cos
> >
>

Re: Problem using puppet scripts to configure bigtop on AmazonLinux

Posted by "Leidle, Rob" <le...@amazon.com>.

Yes -- bare minimum I would like to know if the provisioning/recipe has failed to complete successfully. 



> On Dec 10, 2014, at 5:25 PM, Konstantin Boudnik <co...@apache.org> wrote:
> 
> Rob,
> 
> following on our IRC chat I will Cc here two guys from the community who know
> Puppet the best. Nate and Rich are likely to have the answer. Guys, if you can
> chime in on the topic - it'd be great!
> 
> To reiterate it: you are looking to a way to automatically tell if a recipe
> has failed and repeat it, if required, right?
> 
>> On Sun, Nov 30, 2014 at 09:50PM, Leidle, Rob wrote:
>> Thanks Cos,
>> 
>> This would be something that I would want to automate as it would be
>> running many times across many different clusters. Ideally I would fix any
>> issues causing the puppet scripts to not complete properly, but I don╧t
>> know how realistic that is in the short term so I would like to setup
>> retry logic if that is the recommended way of doing things. That╧s why I
>> was hoping for some direction on how often to run the retry.
>> 
>>> On 11/29/14, 5:12 PM, "Konstantin Boudnik" <co...@apache.org> wrote:
>>> 
>>>> On Sun, Nov 30, 2014 at 12:50AM, Leidle, Rob wrote:
>>>> Thanks Roman,
>>>> 
>>>> I actually fixed the problem. I had an existing process monitoring the
>>>> daemon and restarting it if it terminated. However, puppet encapsulates
>>>> this
>>>> so it is no longer needed. Also, this process was causing the namenode
>>>> service to terminate once. I removed my existing monitoring process and
>>>> everything is working fine.
>>>> 
>>>> That being said is there a recommended number of times we should retry
>>>> the
>>>> puppet scripts on failure?
>>> 
>>> Good to see you're coming through! As for the retries: if something
>>> doesn't
>>> work I usually check the logs immediatelly. Sometimes after a second
>>> re-run.
>>> 
>>> Cos
>>