You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@whimsical.apache.org by Sam Ruby <ru...@apache.org> on 2016/02/04 16:31:49 UTC

[whimsy.git] [1/1] Commit 60f24d0: if all hosts are down, give up

Commit 60f24d0cab030cf6d64ce75c23d05b1ac33bdda7:
    if all hosts are down, give up


Branch: refs/heads/master
Author: Sam Ruby <ru...@intertwingly.net>
Committer: Sam Ruby <ru...@intertwingly.net>
Pusher: rubys <ru...@apache.org>

------------------------------------------------------------
lib/whimsy/asf/ldap.rb                                       | ++ --
------------------------------------------------------------
4 changes: 2 additions, 2 deletions.
------------------------------------------------------------


diff --git a/lib/whimsy/asf/ldap.rb b/lib/whimsy/asf/ldap.rb
index 0376c64..f73646b 100644
--- a/lib/whimsy/asf/ldap.rb
+++ b/lib/whimsy/asf/ldap.rb
@@ -73,7 +73,7 @@ def self.puppet_ldapservers
 
     # connect to LDAP
     def self.connect
-      loop do
+      host.length.times do
         host = next_host
         Wunderbar.info "Connecting to LDAP server: #{host}"
 
@@ -97,8 +97,8 @@ def self.connect
           Wunderbar.warn "Error connecting to LDAP server #{host}: " +
             re.message + " (continuing)"
         end
-
       end
+
       Wunderbar.error "Failed to connect to any LDAP host"
       return nil
     end

Re: [whimsy.git] [1/1] Commit 60f24d0: if all hosts are down, give up

Posted by sebb <se...@gmail.com>.
OK.

I think we should aim for the following minimum.
* connect will never try each host more than once at each call.
* connect will try the hosts in round-robin sequence.

This should be simple to implement, and will work for both.
Though it's unnecessary to wrap-round for one-shot apps.

It's also quite close to the earlier behaviour - the only difference
being that connects are tried round-robin rather than always starting
at the beginning.
This should make query retries more likely to succeed if the first
host connects OK but the query then fails.


On 4 February 2016 at 23:29, Sam Ruby <ru...@intertwingly.net> wrote:
> I'm not near a computer at the moment. Feel free to revert but please don't
> leave it at a place where it doesn't work for long running apps. Thanks
> On Feb 4, 2016 4:47 PM, "sebb" <se...@gmail.com> wrote:
>
>> On 4 February 2016 at 20:42, Sam Ruby <ru...@intertwingly.net> wrote:
>> > On Thu, Feb 4, 2016 at 3:15 PM, sebb <se...@gmail.com> wrote:
>> >> On 4 February 2016 at 19:45, Sam Ruby <ru...@intertwingly.net> wrote:
>> >>> On Thu, Feb 4, 2016 at 12:55 PM, sebb <se...@gmail.com> wrote:
>> >>>> On 4 February 2016 at 15:31, Sam Ruby <ru...@apache.org> wrote:
>> >>>>> Commit 60f24d0cab030cf6d64ce75c23d05b1ac33bdda7:
>> >>>>>     if all hosts are down, give up
>> >>>>>
>> >>>>> Branch: refs/heads/master
>> >>>>> Author: Sam Ruby <ru...@intertwingly.net>
>> >>>>> Committer: Sam Ruby <ru...@intertwingly.net>
>> >>>>> Pusher: rubys <ru...@apache.org>
>> >>>>>
>> >>>>> ------------------------------------------------------------
>> >>>>> lib/whimsy/asf/ldap.rb                                       | ++ --
>> >>>>> ------------------------------------------------------------
>> >>>>> 4 changes: 2 additions, 2 deletions.
>> >>>>> ------------------------------------------------------------
>> >>>>>
>> >>>>>
>> >>>>> diff --git a/lib/whimsy/asf/ldap.rb b/lib/whimsy/asf/ldap.rb
>> >>>>> index 0376c64..f73646b 100644
>> >>>>> --- a/lib/whimsy/asf/ldap.rb
>> >>>>> +++ b/lib/whimsy/asf/ldap.rb
>> >>>>> @@ -73,7 +73,7 @@ def self.puppet_ldapservers
>> >>>>>
>> >>>>>      # connect to LDAP
>> >>>>>      def self.connect
>> >>>>> -      loop do
>> >>>>> +      host.length.times do
>> >>>>
>> >>>> -1
>> >>>>
>> >>>> The original code works fine.
>> >>>>
>> >>>> The new loop tries to loop some 37 times (depending on the size of the
>> >>>> host name) and then fails with
>> >>>>
>> >>>>      iteration reached an end (StopIteration)
>> >>>>
>> >>>> I think the original code works rather better.
>> >>>
>> >>> Can you explain how this works for me?
>> >>>
>> >>> What I see is a "loop do...end" with the only exit being on a success,
>> >>> and the code at the end of this method being unreachable.
>> >>
>> >> loop/end works in conjunction with next_host which is an enum.
>> >> enum.next generates StopIteration which is specifically caught by
>> loop/end.
>> >>
>> >> http://ruby-doc.org/core-1.9.3/StopIteration.html
>> >
>> > That feels a bit too tightly coupled to me.  Having one method's
>> > implementation depend on another method's implementation and all that.
>>
>> Seems reasonable to me - it offers plenty of flexibility in designing
>> loops as the getter can be anywhere in the loop.
>> Otherwise the getter has to be part of the condition checking at the
>> start or end.
>>
>> > But be that as it may, I missed the fact that once you run out of your
>> > enumeration, it is game over.  That doesn't work well with long
>> > running applications, like the board agenda.  37 failures and it is
>> > time to restart the server.
>>
>> In fact it's about 7 failures (or however many are in the host list)
>>
>> The loop maximum was 37 (approx) but it will only try each host once.
>>
>> > That doesn't work for me.
>> >
>> >>> I will say that whether there are 4 hosts or 50 hosts, trying exactly
>> >>> 37 times (depending on the length of an unrelated string) seems kinda
>> >>> weird.
>> >>
>> >> It does it 37 or so times because host holds the host URL, and that is
>> >> roughly its length.
>> >
>> > That seems totally random.
>>
>> > I could understand trying each host once.
>>
>> Which is what it does.
>>
>> > Or trying each host twice.  Or trying something like [hosts.length,
>> > 3].min times.  But trying a number of times based on a url length?
>>
>> See above, 37 approx is the *max* loop count. I think you
>> misunderstood me earlier.
>>
>> >> The host is the wrong variable to use.
>>
>> Hence the peculiar loop count.
>>
>> >> But so is the hosts array wrong, because that will always return the
>> >> original number of hosts, even if some have already been used up and
>> >> we are resuming the loop.
>> >> The loop will then fall off the end of the enum, generating a
>> >> StopIteration, which is not caught by the x.length.times do loop.
>> >
>> > I would rather next_host continuously provide hosts.
>>
>> OK, but this is a different design.
>>
>> My intention was to ensure that hosts that have already failed are not
>> tried again.
>> This makes sense for all the apps I have worked on.
>>
>> However I agree that does not sit well with long-running apps.
>>
>> > When it runs out
>> > of a batch, it starts a new set.  I was too quick in scanning the code
>> > and presumed that's what the @he ||= was doing.
>> >
>> > Perhaps something like the following instead:
>> >
>> >        @he = hosts.dup unless @he and not @he.empty?
>> >        @he.pop
>> >
>> > The net effect is that it would try each host in turn.  If there later
>> > is a failure, it will start over and loop back around if necessary
>> > until it finds a host that works.
>> >
>> > Thoughts?
>>
>> The host re-use behaviour should be selectable.
>>
>> There's no point looping round all the hosts again for the one-shot
>> apps that are going to be run again soon.
>> If all the hosts have failed, the likelihood is that they will do so
>> again if tried again immediately.
>> So just give up and hope the fault has cleared for the next run.
>>
>> For long-running apps, the situation is a bit different.
>> It's fine if each host is tried multiple times, so long as that
>> happens over a long period.
>> However there needs to be a guard against repeatedly trying all the
>> hosts in rapid succession.
>>
>> So I don't think simply resetting the list will do; there needs to be
>> a time delay.
>> Nor does it make sense to always pause before restarting.
>> Perhaps it would work to store the time when the list was last
>> started, and add a wait if necessary before restarting.
>> This would stop endless rapid recycling, but would still not be ideal
>> behaviour e.g. if the external network was down.
>>
>> Is there a stage when even the long-running apps need to give up?
>> And if so, how is that detected?
>>
>> Note: there was no auto retry at all until recently, so what do the
>> long-running apps do currently?
>> Do they try to recover from LDAP queries?
>>
>> If so, it seems more likely that the current design (StopIteration
>> generated and not caught by connect) is less likely to play well with
>> any recovery.
>>
>> So whatever the case, I think the change needs to be reverted.
>>
>> We can then improve the host re-use strategy for the long-running apps.
>>
>> > - Sam Ruby
>> >
>> >>> And failing with a StopIteration is less desirable than producing a
>> >>> clean message, such as "Failed to connect to any LDAP host".  I can
>> >>> see the advantage of raising an exception over returning nil, but I
>> >>> would prefer a more meaningful message.
>> >>
>> >> It only *fails* with StopIteration with the updated code.
>> >> That is because the loop count is too large, so next_host is called
>> >> more times than there are hosts.
>> >>
>> >> With the previous code, next_host is also called one extra time, but
>> >> the generated StopIteration is caught by the loop/do, as designed.
>> >>
>> >> http://ruby-doc.org/core-1.9.3/StopIteration.html
>> >>
>> >> Try defining single failing URL in your .whimsy file and you will see
>> >> what I mean.
>> >>
>> >>> - Sam Ruby
>> >>>
>> >>>
>> >>>>>          host = next_host
>> >>>>>          Wunderbar.info "Connecting to LDAP server: #{host}"
>> >>>>>
>> >>>>> @@ -97,8 +97,8 @@ def self.connect
>> >>>>>            Wunderbar.warn "Error connecting to LDAP server #{host}:
>> " +
>> >>>>>              re.message + " (continuing)"
>> >>>>>          end
>> >>>>> -
>> >>>>>        end
>> >>>>> +
>> >>>>>        Wunderbar.error "Failed to connect to any LDAP host"
>> >>>>>        return nil
>> >>>>>      end
>>

Re: [whimsy.git] [1/1] Commit 60f24d0: if all hosts are down, give up

Posted by Sam Ruby <ru...@intertwingly.net>.
I'm not near a computer at the moment. Feel free to revert but please don't
leave it at a place where it doesn't work for long running apps. Thanks
On Feb 4, 2016 4:47 PM, "sebb" <se...@gmail.com> wrote:

> On 4 February 2016 at 20:42, Sam Ruby <ru...@intertwingly.net> wrote:
> > On Thu, Feb 4, 2016 at 3:15 PM, sebb <se...@gmail.com> wrote:
> >> On 4 February 2016 at 19:45, Sam Ruby <ru...@intertwingly.net> wrote:
> >>> On Thu, Feb 4, 2016 at 12:55 PM, sebb <se...@gmail.com> wrote:
> >>>> On 4 February 2016 at 15:31, Sam Ruby <ru...@apache.org> wrote:
> >>>>> Commit 60f24d0cab030cf6d64ce75c23d05b1ac33bdda7:
> >>>>>     if all hosts are down, give up
> >>>>>
> >>>>> Branch: refs/heads/master
> >>>>> Author: Sam Ruby <ru...@intertwingly.net>
> >>>>> Committer: Sam Ruby <ru...@intertwingly.net>
> >>>>> Pusher: rubys <ru...@apache.org>
> >>>>>
> >>>>> ------------------------------------------------------------
> >>>>> lib/whimsy/asf/ldap.rb                                       | ++ --
> >>>>> ------------------------------------------------------------
> >>>>> 4 changes: 2 additions, 2 deletions.
> >>>>> ------------------------------------------------------------
> >>>>>
> >>>>>
> >>>>> diff --git a/lib/whimsy/asf/ldap.rb b/lib/whimsy/asf/ldap.rb
> >>>>> index 0376c64..f73646b 100644
> >>>>> --- a/lib/whimsy/asf/ldap.rb
> >>>>> +++ b/lib/whimsy/asf/ldap.rb
> >>>>> @@ -73,7 +73,7 @@ def self.puppet_ldapservers
> >>>>>
> >>>>>      # connect to LDAP
> >>>>>      def self.connect
> >>>>> -      loop do
> >>>>> +      host.length.times do
> >>>>
> >>>> -1
> >>>>
> >>>> The original code works fine.
> >>>>
> >>>> The new loop tries to loop some 37 times (depending on the size of the
> >>>> host name) and then fails with
> >>>>
> >>>>      iteration reached an end (StopIteration)
> >>>>
> >>>> I think the original code works rather better.
> >>>
> >>> Can you explain how this works for me?
> >>>
> >>> What I see is a "loop do...end" with the only exit being on a success,
> >>> and the code at the end of this method being unreachable.
> >>
> >> loop/end works in conjunction with next_host which is an enum.
> >> enum.next generates StopIteration which is specifically caught by
> loop/end.
> >>
> >> http://ruby-doc.org/core-1.9.3/StopIteration.html
> >
> > That feels a bit too tightly coupled to me.  Having one method's
> > implementation depend on another method's implementation and all that.
>
> Seems reasonable to me - it offers plenty of flexibility in designing
> loops as the getter can be anywhere in the loop.
> Otherwise the getter has to be part of the condition checking at the
> start or end.
>
> > But be that as it may, I missed the fact that once you run out of your
> > enumeration, it is game over.  That doesn't work well with long
> > running applications, like the board agenda.  37 failures and it is
> > time to restart the server.
>
> In fact it's about 7 failures (or however many are in the host list)
>
> The loop maximum was 37 (approx) but it will only try each host once.
>
> > That doesn't work for me.
> >
> >>> I will say that whether there are 4 hosts or 50 hosts, trying exactly
> >>> 37 times (depending on the length of an unrelated string) seems kinda
> >>> weird.
> >>
> >> It does it 37 or so times because host holds the host URL, and that is
> >> roughly its length.
> >
> > That seems totally random.
>
> > I could understand trying each host once.
>
> Which is what it does.
>
> > Or trying each host twice.  Or trying something like [hosts.length,
> > 3].min times.  But trying a number of times based on a url length?
>
> See above, 37 approx is the *max* loop count. I think you
> misunderstood me earlier.
>
> >> The host is the wrong variable to use.
>
> Hence the peculiar loop count.
>
> >> But so is the hosts array wrong, because that will always return the
> >> original number of hosts, even if some have already been used up and
> >> we are resuming the loop.
> >> The loop will then fall off the end of the enum, generating a
> >> StopIteration, which is not caught by the x.length.times do loop.
> >
> > I would rather next_host continuously provide hosts.
>
> OK, but this is a different design.
>
> My intention was to ensure that hosts that have already failed are not
> tried again.
> This makes sense for all the apps I have worked on.
>
> However I agree that does not sit well with long-running apps.
>
> > When it runs out
> > of a batch, it starts a new set.  I was too quick in scanning the code
> > and presumed that's what the @he ||= was doing.
> >
> > Perhaps something like the following instead:
> >
> >        @he = hosts.dup unless @he and not @he.empty?
> >        @he.pop
> >
> > The net effect is that it would try each host in turn.  If there later
> > is a failure, it will start over and loop back around if necessary
> > until it finds a host that works.
> >
> > Thoughts?
>
> The host re-use behaviour should be selectable.
>
> There's no point looping round all the hosts again for the one-shot
> apps that are going to be run again soon.
> If all the hosts have failed, the likelihood is that they will do so
> again if tried again immediately.
> So just give up and hope the fault has cleared for the next run.
>
> For long-running apps, the situation is a bit different.
> It's fine if each host is tried multiple times, so long as that
> happens over a long period.
> However there needs to be a guard against repeatedly trying all the
> hosts in rapid succession.
>
> So I don't think simply resetting the list will do; there needs to be
> a time delay.
> Nor does it make sense to always pause before restarting.
> Perhaps it would work to store the time when the list was last
> started, and add a wait if necessary before restarting.
> This would stop endless rapid recycling, but would still not be ideal
> behaviour e.g. if the external network was down.
>
> Is there a stage when even the long-running apps need to give up?
> And if so, how is that detected?
>
> Note: there was no auto retry at all until recently, so what do the
> long-running apps do currently?
> Do they try to recover from LDAP queries?
>
> If so, it seems more likely that the current design (StopIteration
> generated and not caught by connect) is less likely to play well with
> any recovery.
>
> So whatever the case, I think the change needs to be reverted.
>
> We can then improve the host re-use strategy for the long-running apps.
>
> > - Sam Ruby
> >
> >>> And failing with a StopIteration is less desirable than producing a
> >>> clean message, such as "Failed to connect to any LDAP host".  I can
> >>> see the advantage of raising an exception over returning nil, but I
> >>> would prefer a more meaningful message.
> >>
> >> It only *fails* with StopIteration with the updated code.
> >> That is because the loop count is too large, so next_host is called
> >> more times than there are hosts.
> >>
> >> With the previous code, next_host is also called one extra time, but
> >> the generated StopIteration is caught by the loop/do, as designed.
> >>
> >> http://ruby-doc.org/core-1.9.3/StopIteration.html
> >>
> >> Try defining single failing URL in your .whimsy file and you will see
> >> what I mean.
> >>
> >>> - Sam Ruby
> >>>
> >>>
> >>>>>          host = next_host
> >>>>>          Wunderbar.info "Connecting to LDAP server: #{host}"
> >>>>>
> >>>>> @@ -97,8 +97,8 @@ def self.connect
> >>>>>            Wunderbar.warn "Error connecting to LDAP server #{host}:
> " +
> >>>>>              re.message + " (continuing)"
> >>>>>          end
> >>>>> -
> >>>>>        end
> >>>>> +
> >>>>>        Wunderbar.error "Failed to connect to any LDAP host"
> >>>>>        return nil
> >>>>>      end
>

Re: [whimsy.git] [1/1] Commit 60f24d0: if all hosts are down, give up

Posted by sebb <se...@gmail.com>.
On 4 February 2016 at 20:42, Sam Ruby <ru...@intertwingly.net> wrote:
> On Thu, Feb 4, 2016 at 3:15 PM, sebb <se...@gmail.com> wrote:
>> On 4 February 2016 at 19:45, Sam Ruby <ru...@intertwingly.net> wrote:
>>> On Thu, Feb 4, 2016 at 12:55 PM, sebb <se...@gmail.com> wrote:
>>>> On 4 February 2016 at 15:31, Sam Ruby <ru...@apache.org> wrote:
>>>>> Commit 60f24d0cab030cf6d64ce75c23d05b1ac33bdda7:
>>>>>     if all hosts are down, give up
>>>>>
>>>>> Branch: refs/heads/master
>>>>> Author: Sam Ruby <ru...@intertwingly.net>
>>>>> Committer: Sam Ruby <ru...@intertwingly.net>
>>>>> Pusher: rubys <ru...@apache.org>
>>>>>
>>>>> ------------------------------------------------------------
>>>>> lib/whimsy/asf/ldap.rb                                       | ++ --
>>>>> ------------------------------------------------------------
>>>>> 4 changes: 2 additions, 2 deletions.
>>>>> ------------------------------------------------------------
>>>>>
>>>>>
>>>>> diff --git a/lib/whimsy/asf/ldap.rb b/lib/whimsy/asf/ldap.rb
>>>>> index 0376c64..f73646b 100644
>>>>> --- a/lib/whimsy/asf/ldap.rb
>>>>> +++ b/lib/whimsy/asf/ldap.rb
>>>>> @@ -73,7 +73,7 @@ def self.puppet_ldapservers
>>>>>
>>>>>      # connect to LDAP
>>>>>      def self.connect
>>>>> -      loop do
>>>>> +      host.length.times do
>>>>
>>>> -1
>>>>
>>>> The original code works fine.
>>>>
>>>> The new loop tries to loop some 37 times (depending on the size of the
>>>> host name) and then fails with
>>>>
>>>>      iteration reached an end (StopIteration)
>>>>
>>>> I think the original code works rather better.
>>>
>>> Can you explain how this works for me?
>>>
>>> What I see is a "loop do...end" with the only exit being on a success,
>>> and the code at the end of this method being unreachable.
>>
>> loop/end works in conjunction with next_host which is an enum.
>> enum.next generates StopIteration which is specifically caught by loop/end.
>>
>> http://ruby-doc.org/core-1.9.3/StopIteration.html
>
> That feels a bit too tightly coupled to me.  Having one method's
> implementation depend on another method's implementation and all that.

Seems reasonable to me - it offers plenty of flexibility in designing
loops as the getter can be anywhere in the loop.
Otherwise the getter has to be part of the condition checking at the
start or end.

> But be that as it may, I missed the fact that once you run out of your
> enumeration, it is game over.  That doesn't work well with long
> running applications, like the board agenda.  37 failures and it is
> time to restart the server.

In fact it's about 7 failures (or however many are in the host list)

The loop maximum was 37 (approx) but it will only try each host once.

> That doesn't work for me.
>
>>> I will say that whether there are 4 hosts or 50 hosts, trying exactly
>>> 37 times (depending on the length of an unrelated string) seems kinda
>>> weird.
>>
>> It does it 37 or so times because host holds the host URL, and that is
>> roughly its length.
>
> That seems totally random.

> I could understand trying each host once.

Which is what it does.

> Or trying each host twice.  Or trying something like [hosts.length,
> 3].min times.  But trying a number of times based on a url length?

See above, 37 approx is the *max* loop count. I think you
misunderstood me earlier.

>> The host is the wrong variable to use.

Hence the peculiar loop count.

>> But so is the hosts array wrong, because that will always return the
>> original number of hosts, even if some have already been used up and
>> we are resuming the loop.
>> The loop will then fall off the end of the enum, generating a
>> StopIteration, which is not caught by the x.length.times do loop.
>
> I would rather next_host continuously provide hosts.

OK, but this is a different design.

My intention was to ensure that hosts that have already failed are not
tried again.
This makes sense for all the apps I have worked on.

However I agree that does not sit well with long-running apps.

> When it runs out
> of a batch, it starts a new set.  I was too quick in scanning the code
> and presumed that's what the @he ||= was doing.
>
> Perhaps something like the following instead:
>
>        @he = hosts.dup unless @he and not @he.empty?
>        @he.pop
>
> The net effect is that it would try each host in turn.  If there later
> is a failure, it will start over and loop back around if necessary
> until it finds a host that works.
>
> Thoughts?

The host re-use behaviour should be selectable.

There's no point looping round all the hosts again for the one-shot
apps that are going to be run again soon.
If all the hosts have failed, the likelihood is that they will do so
again if tried again immediately.
So just give up and hope the fault has cleared for the next run.

For long-running apps, the situation is a bit different.
It's fine if each host is tried multiple times, so long as that
happens over a long period.
However there needs to be a guard against repeatedly trying all the
hosts in rapid succession.

So I don't think simply resetting the list will do; there needs to be
a time delay.
Nor does it make sense to always pause before restarting.
Perhaps it would work to store the time when the list was last
started, and add a wait if necessary before restarting.
This would stop endless rapid recycling, but would still not be ideal
behaviour e.g. if the external network was down.

Is there a stage when even the long-running apps need to give up?
And if so, how is that detected?

Note: there was no auto retry at all until recently, so what do the
long-running apps do currently?
Do they try to recover from LDAP queries?

If so, it seems more likely that the current design (StopIteration
generated and not caught by connect) is less likely to play well with
any recovery.

So whatever the case, I think the change needs to be reverted.

We can then improve the host re-use strategy for the long-running apps.

> - Sam Ruby
>
>>> And failing with a StopIteration is less desirable than producing a
>>> clean message, such as "Failed to connect to any LDAP host".  I can
>>> see the advantage of raising an exception over returning nil, but I
>>> would prefer a more meaningful message.
>>
>> It only *fails* with StopIteration with the updated code.
>> That is because the loop count is too large, so next_host is called
>> more times than there are hosts.
>>
>> With the previous code, next_host is also called one extra time, but
>> the generated StopIteration is caught by the loop/do, as designed.
>>
>> http://ruby-doc.org/core-1.9.3/StopIteration.html
>>
>> Try defining single failing URL in your .whimsy file and you will see
>> what I mean.
>>
>>> - Sam Ruby
>>>
>>>
>>>>>          host = next_host
>>>>>          Wunderbar.info "Connecting to LDAP server: #{host}"
>>>>>
>>>>> @@ -97,8 +97,8 @@ def self.connect
>>>>>            Wunderbar.warn "Error connecting to LDAP server #{host}: " +
>>>>>              re.message + " (continuing)"
>>>>>          end
>>>>> -
>>>>>        end
>>>>> +
>>>>>        Wunderbar.error "Failed to connect to any LDAP host"
>>>>>        return nil
>>>>>      end

Re: [whimsy.git] [1/1] Commit 60f24d0: if all hosts are down, give up

Posted by Sam Ruby <ru...@intertwingly.net>.
On Thu, Feb 4, 2016 at 3:15 PM, sebb <se...@gmail.com> wrote:
> On 4 February 2016 at 19:45, Sam Ruby <ru...@intertwingly.net> wrote:
>> On Thu, Feb 4, 2016 at 12:55 PM, sebb <se...@gmail.com> wrote:
>>> On 4 February 2016 at 15:31, Sam Ruby <ru...@apache.org> wrote:
>>>> Commit 60f24d0cab030cf6d64ce75c23d05b1ac33bdda7:
>>>>     if all hosts are down, give up
>>>>
>>>> Branch: refs/heads/master
>>>> Author: Sam Ruby <ru...@intertwingly.net>
>>>> Committer: Sam Ruby <ru...@intertwingly.net>
>>>> Pusher: rubys <ru...@apache.org>
>>>>
>>>> ------------------------------------------------------------
>>>> lib/whimsy/asf/ldap.rb                                       | ++ --
>>>> ------------------------------------------------------------
>>>> 4 changes: 2 additions, 2 deletions.
>>>> ------------------------------------------------------------
>>>>
>>>>
>>>> diff --git a/lib/whimsy/asf/ldap.rb b/lib/whimsy/asf/ldap.rb
>>>> index 0376c64..f73646b 100644
>>>> --- a/lib/whimsy/asf/ldap.rb
>>>> +++ b/lib/whimsy/asf/ldap.rb
>>>> @@ -73,7 +73,7 @@ def self.puppet_ldapservers
>>>>
>>>>      # connect to LDAP
>>>>      def self.connect
>>>> -      loop do
>>>> +      host.length.times do
>>>
>>> -1
>>>
>>> The original code works fine.
>>>
>>> The new loop tries to loop some 37 times (depending on the size of the
>>> host name) and then fails with
>>>
>>>      iteration reached an end (StopIteration)
>>>
>>> I think the original code works rather better.
>>
>> Can you explain how this works for me?
>>
>> What I see is a "loop do...end" with the only exit being on a success,
>> and the code at the end of this method being unreachable.
>
> loop/end works in conjunction with next_host which is an enum.
> enum.next generates StopIteration which is specifically caught by loop/end.
>
> http://ruby-doc.org/core-1.9.3/StopIteration.html

That feels a bit too tightly coupled to me.  Having one method's
implementation depend on another method's implementation and all that.

But be that as it may, I missed the fact that once you run out of your
enumeration, it is game over.  That doesn't work well with long
running applications, like the board agenda.  37 failures and it is
time to restart the server.

That doesn't work for me.

>> I will say that whether there are 4 hosts or 50 hosts, trying exactly
>> 37 times (depending on the length of an unrelated string) seems kinda
>> weird.
>
> It does it 37 or so times because host holds the host URL, and that is
> roughly its length.

That seems totally random.  I could understand trying each host once.
Or trying each host twice.  Or trying something like [hosts.length,
3].min times.  But trying a number of times based on a url length?

> The host is the wrong variable to use.
> But so is the hosts array wrong, because that will always return the
> original number of hosts, even if some have already been used up and
> we are resuming the loop.
> The loop will then fall off the end of the enum, generating a
> StopIteration, which is not caught by the x.length.times do loop.

I would rather next_host continuously provide hosts.  When it runs out
of a batch, it starts a new set.  I was too quick in scanning the code
and presumed that's what the @he ||= was doing.

Perhaps something like the following instead:

       @he = hosts.dup unless @he and not @he.empty?
       @he.pop

The net effect is that it would try each host in turn.  If there later
is a failure, it will start over and loop back around if necessary
until it finds a host that works.

Thoughts?

- Sam Ruby

>> And failing with a StopIteration is less desirable than producing a
>> clean message, such as "Failed to connect to any LDAP host".  I can
>> see the advantage of raising an exception over returning nil, but I
>> would prefer a more meaningful message.
>
> It only *fails* with StopIteration with the updated code.
> That is because the loop count is too large, so next_host is called
> more times than there are hosts.
>
> With the previous code, next_host is also called one extra time, but
> the generated StopIteration is caught by the loop/do, as designed.
>
> http://ruby-doc.org/core-1.9.3/StopIteration.html
>
> Try defining single failing URL in your .whimsy file and you will see
> what I mean.
>
>> - Sam Ruby
>>
>>
>>>>          host = next_host
>>>>          Wunderbar.info "Connecting to LDAP server: #{host}"
>>>>
>>>> @@ -97,8 +97,8 @@ def self.connect
>>>>            Wunderbar.warn "Error connecting to LDAP server #{host}: " +
>>>>              re.message + " (continuing)"
>>>>          end
>>>> -
>>>>        end
>>>> +
>>>>        Wunderbar.error "Failed to connect to any LDAP host"
>>>>        return nil
>>>>      end

Re: [whimsy.git] [1/1] Commit 60f24d0: if all hosts are down, give up

Posted by sebb <se...@gmail.com>.
On 4 February 2016 at 19:45, Sam Ruby <ru...@intertwingly.net> wrote:
> On Thu, Feb 4, 2016 at 12:55 PM, sebb <se...@gmail.com> wrote:
>> On 4 February 2016 at 15:31, Sam Ruby <ru...@apache.org> wrote:
>>> Commit 60f24d0cab030cf6d64ce75c23d05b1ac33bdda7:
>>>     if all hosts are down, give up
>>>
>>> Branch: refs/heads/master
>>> Author: Sam Ruby <ru...@intertwingly.net>
>>> Committer: Sam Ruby <ru...@intertwingly.net>
>>> Pusher: rubys <ru...@apache.org>
>>>
>>> ------------------------------------------------------------
>>> lib/whimsy/asf/ldap.rb                                       | ++ --
>>> ------------------------------------------------------------
>>> 4 changes: 2 additions, 2 deletions.
>>> ------------------------------------------------------------
>>>
>>>
>>> diff --git a/lib/whimsy/asf/ldap.rb b/lib/whimsy/asf/ldap.rb
>>> index 0376c64..f73646b 100644
>>> --- a/lib/whimsy/asf/ldap.rb
>>> +++ b/lib/whimsy/asf/ldap.rb
>>> @@ -73,7 +73,7 @@ def self.puppet_ldapservers
>>>
>>>      # connect to LDAP
>>>      def self.connect
>>> -      loop do
>>> +      host.length.times do
>>
>> -1
>>
>> The original code works fine.
>>
>> The new loop tries to loop some 37 times (depending on the size of the
>> host name) and then fails with
>>
>>      iteration reached an end (StopIteration)
>>
>> I think the original code works rather better.
>
> Can you explain how this works for me?
>
> What I see is a "loop do...end" with the only exit being on a success,
> and the code at the end of this method being unreachable.

loop/end works in conjunction with next_host which is an enum.
enum.next generates StopIteration which is specifically caught by loop/end.

http://ruby-doc.org/core-1.9.3/StopIteration.html

> I will say that whether there are 4 hosts or 50 hosts, trying exactly
> 37 times (depending on the length of an unrelated string) seems kinda
> weird.

It does it 37 or so times because host holds the host URL, and that is
roughly its length.
The host is the wrong variable to use.
But so is the hosts array wrong, because that will always return the
original number of hosts, even if some have already been used up and
we are resuming the loop.
The loop will then fall off the end of the enum, generating a
StopIteration, which is not caught by the x.length.times do loop.

> And failing with a StopIteration is less desirable than producing a
> clean message, such as "Failed to connect to any LDAP host".  I can
> see the advantage of raising an exception over returning nil, but I
> would prefer a more meaningful message.

It only *fails* with StopIteration with the updated code.
That is because the loop count is too large, so next_host is called
more times than there are hosts.

With the previous code, next_host is also called one extra time, but
the generated StopIteration is caught by the loop/do, as designed.

http://ruby-doc.org/core-1.9.3/StopIteration.html

Try defining single failing URL in your .whimsy file and you will see
what I mean.

> - Sam Ruby
>
>
>>>          host = next_host
>>>          Wunderbar.info "Connecting to LDAP server: #{host}"
>>>
>>> @@ -97,8 +97,8 @@ def self.connect
>>>            Wunderbar.warn "Error connecting to LDAP server #{host}: " +
>>>              re.message + " (continuing)"
>>>          end
>>> -
>>>        end
>>> +
>>>        Wunderbar.error "Failed to connect to any LDAP host"
>>>        return nil
>>>      end

Re: [whimsy.git] [1/1] Commit 60f24d0: if all hosts are down, give up

Posted by Sam Ruby <ru...@intertwingly.net>.
On Thu, Feb 4, 2016 at 12:55 PM, sebb <se...@gmail.com> wrote:
> On 4 February 2016 at 15:31, Sam Ruby <ru...@apache.org> wrote:
>> Commit 60f24d0cab030cf6d64ce75c23d05b1ac33bdda7:
>>     if all hosts are down, give up
>>
>> Branch: refs/heads/master
>> Author: Sam Ruby <ru...@intertwingly.net>
>> Committer: Sam Ruby <ru...@intertwingly.net>
>> Pusher: rubys <ru...@apache.org>
>>
>> ------------------------------------------------------------
>> lib/whimsy/asf/ldap.rb                                       | ++ --
>> ------------------------------------------------------------
>> 4 changes: 2 additions, 2 deletions.
>> ------------------------------------------------------------
>>
>>
>> diff --git a/lib/whimsy/asf/ldap.rb b/lib/whimsy/asf/ldap.rb
>> index 0376c64..f73646b 100644
>> --- a/lib/whimsy/asf/ldap.rb
>> +++ b/lib/whimsy/asf/ldap.rb
>> @@ -73,7 +73,7 @@ def self.puppet_ldapservers
>>
>>      # connect to LDAP
>>      def self.connect
>> -      loop do
>> +      host.length.times do
>
> -1
>
> The original code works fine.
>
> The new loop tries to loop some 37 times (depending on the size of the
> host name) and then fails with
>
>      iteration reached an end (StopIteration)
>
> I think the original code works rather better.

Can you explain how this works for me?

What I see is a "loop do...end" with the only exit being on a success,
and the code at the end of this method being unreachable.

I will say that whether there are 4 hosts or 50 hosts, trying exactly
37 times (depending on the length of an unrelated string) seems kinda
weird.

And failing with a StopIteration is less desirable than producing a
clean message, such as "Failed to connect to any LDAP host".  I can
see the advantage of raising an exception over returning nil, but I
would prefer a more meaningful message.

- Sam Ruby


>>          host = next_host
>>          Wunderbar.info "Connecting to LDAP server: #{host}"
>>
>> @@ -97,8 +97,8 @@ def self.connect
>>            Wunderbar.warn "Error connecting to LDAP server #{host}: " +
>>              re.message + " (continuing)"
>>          end
>> -
>>        end
>> +
>>        Wunderbar.error "Failed to connect to any LDAP host"
>>        return nil
>>      end

Re: [whimsy.git] [1/1] Commit 60f24d0: if all hosts are down, give up

Posted by sebb <se...@gmail.com>.
On 4 February 2016 at 15:31, Sam Ruby <ru...@apache.org> wrote:
> Commit 60f24d0cab030cf6d64ce75c23d05b1ac33bdda7:
>     if all hosts are down, give up
>
>
> Branch: refs/heads/master
> Author: Sam Ruby <ru...@intertwingly.net>
> Committer: Sam Ruby <ru...@intertwingly.net>
> Pusher: rubys <ru...@apache.org>
>
> ------------------------------------------------------------
> lib/whimsy/asf/ldap.rb                                       | ++ --
> ------------------------------------------------------------
> 4 changes: 2 additions, 2 deletions.
> ------------------------------------------------------------
>
>
> diff --git a/lib/whimsy/asf/ldap.rb b/lib/whimsy/asf/ldap.rb
> index 0376c64..f73646b 100644
> --- a/lib/whimsy/asf/ldap.rb
> +++ b/lib/whimsy/asf/ldap.rb
> @@ -73,7 +73,7 @@ def self.puppet_ldapservers
>
>      # connect to LDAP
>      def self.connect
> -      loop do
> +      host.length.times do

-1

The original code works fine.

The new loop tries to loop some 37 times (depending on the size of the
host name) and then fails with

     iteration reached an end (StopIteration)

I think the original code works rather better.

>          host = next_host
>          Wunderbar.info "Connecting to LDAP server: #{host}"
>
> @@ -97,8 +97,8 @@ def self.connect
>            Wunderbar.warn "Error connecting to LDAP server #{host}: " +
>              re.message + " (continuing)"
>          end
> -
>        end
> +
>        Wunderbar.error "Failed to connect to any LDAP host"
>        return nil
>      end