You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mynewt.apache.org by Sterling Hughes <st...@apache.org> on 2016/09/23 05:58:22 UTC

Re: hal watchdog

Hey \u2014

A follow up on this, I\u2019ve committed initial support for the Nordic 
platforms for the watchdog, along with modifying this API a bit.

I made the watchdog expiry a millisecond value (in hal_watchdog_init()), 
as pretty much every watchdog I\u2019ve seen executes in millisecond 
resolution.  I removed the hal_watchdog_stop() function, as many 
processors don\u2019t offer the ability to stop the watchdog once started.

I also hooked the watchdog into the OS with two new configuration 
options:

     SANITY_INTERVAL:
         description: 'The interval at which the sanity checks should 
run, should be at least 200ms prior to watchdog'
         value: 299500
     WATCHDOG_INTERVAL:
         description: 'The interval at which the watchdog should reset 
if not tickled, in ms'
         value: 300000

These are default values, defined by the OS (libs/os/pkg.yml), that can 
be overridden by the BSP.

By default the OS initializes the watchdog with hal_watchdog_init() when 
OS start is called.  I have removed the sanity task (to save stack 
space, and make it run by default), and instead, put all sanity related 
functions within the idle task.

The logic is here (os.c), for reference.  os_sanity_run() calls into 
os_sanity.c, and runs the sanity checks.  Code has also been added to 
make sure that we don\u2019t sleep beyond the sanity interval, and trip up 
the watchdog in our idle loop.

     sanity_itvl_ticks = (MYNEWT_VAL(SANITY_INTERVAL) * 
OS_TICKS_PER_SEC) / 1000;
     sanity_last = 0;

     hal_watchdog_tickle();

     while (1) {
         ++g_os_idle_ctr;

         now = os_time_get();
         if (OS_TIME_TICK_GT(now, sanity_last + sanity_itvl_ticks)) {
             os_sanity_run();
             /* Tickle the watchdog after successfully running sanity */
             hal_watchdog_tickle();
             sanity_last = now;
         }

         OS_ENTER_CRITICAL(sr);
         now = os_time_get();
         sticks = os_sched_wakeup_ticks(now);
         cticks = os_callout_wakeup_ticks(now);
         iticks = min(sticks, cticks);
         /* Wakeup in time to run sanity as well from the idle context,
          * as the idle task does not schedule itself.
          */
         iticks = min(iticks, ((sanity_last + sanity_itvl_ticks) - 
now));

         if (iticks < MIN_IDLE_TICKS) {
             iticks = 0;
         } else if (iticks > MAX_IDLE_TICKS) {
             iticks = MAX_IDLE_TICKS;
         } else {
             /* NOTHING */
         }
         /* Tell the architecture specific support to put the processor 
to sleep
          * for 'n' ticks.
          */
         os_tick_idle(iticks);
         OS_EXIT_CRITICAL(sr);


For BSPs where hal_watchdog has not been implemented, calling these 
functions has no effect, and I\u2019ve added stubs to the stm32f4 and 
native BSPs.

Sterling


On 30 Aug 2016, at 15:01, marko kiiskila wrote:

>
>> On Aug 30, 2016, at 12:59 PM, Mathew Calmer 
>> <mc...@exploramednc7.com> wrote:
>>
>> On August 30, 2016 at 12:28:50 PM, will sanfilippo 
>> (wills@runtime.io<ma...@runtime.io>) wrote:
>> Sounds reasonable. As I am sure you know, doing it through the sanity 
>> task sometimes is an issue getting the time right as you would then 
>> need to know the worst-case timing of all the tasks that could be 
>> running\u2026 but any way you cut it, you have to put some time limit on 
>> that\u2026 in past lives I have seen some pretty complicated ways to 
>> deal with this but this seems reasonable and if developers need 
>> something different they can implement it with this hal.
>>
>>
>>
>> I would consider making the return value of init() be the time or 
>> some reference to the time that was actually set.  So, for example, 
>> if the user asks for 10000 ticks, and the system can only support 
>> 2000, it could return 2000 from init, after trying it\u2019s best to 
>> support the request.
>>
>> It could be done in powers of two or some other mechanism, but 
>> conceptually using that return value to explain what was actually set 
>> would be a nice interface.   If watchdog was not implemented on given 
>> hardware, default return could be negative (error) or 0, implying 
>> watchdog was not set (although 0 also implies success\u2026 so\u2026).
>>
>
> I was thinking the same regarding return value from init(), but had 
> not written it down in the
> API proposal.
>
> I\u2019m including updated version here.
>
> /*
>  * Set a recurring watchdog timer to fire no sooner than in 
> 'expire_secs'
>  * seconds. Watchdog should be tickled periodically with a frequency
>  * smaller than 'expire_secs'. Watchdog needs to be then started with
>  * a call to hal_watchdog_enable().
>  *
>  * @param expire_secs           Watchdog timer expiration time
>  *
>  * @return                      < 0 on failure; on success return the 
> actual
>  *                              expiration time as positive value
>  */
> int hal_watchdog_init(int expire_secs);
>
> /*
>  * Starts the watchdog.
>  *
>  * @return                      0 on success; non-zero on failure.
>  */
> int hal_watchdog_enable(void);
>
> /*
>  * Stops the watchdog.
>  *
>  * @return                      0 on success; non-zero on failure.
>  */
> int hal_watchdog_stop(void);
>
> /*
>  * Tickles the watchdog.
>  */
> void hal_watchdog_tickle(void);
>

Re: hal watchdog

Posted by will sanfilippo <wi...@runtime.io>.

Well, I have worked on systems where 200 or even 500 msecs would definitely have caused the code to crash. For example, you have to do lots of crypto processing and you dont have HW support. That being said, I think this is a moot point as it is configurable so all good.

I just like to hear myself talk sometimes :-)

> On Sep 23, 2016, at 11:03 AM, Mathew Calmer <mc...@exploramednc7.com> wrote:
> 
>> 
>> I also worry (slightly) that the default time between sanity waking up and the watchdog firing may not be enough. It is configurable so that is all good, but maybe the default should be a bit longer as sanity is checked in the idle task and if a system has lots of tasks the idle task may not run for a bit (although 200 msecs is a while).
>> 
> Feels to me that if your idle task is taking longer than 200 msecs to circle, you probably _want_ your watchdog kicking in.
> 
> 200 ms is really long.
> 
>> 
>> > On Sep 22, 2016, at 10:58 PM, Sterling Hughes <st...@apache.org> <mailto:sterling@apache.org%3E> wrote:
>> >
>> > Hey —
>> >
>> > A follow up on this, I’ve committed initial support for the Nordic platforms for the watchdog, along with modifying this API a bit.
>> >
>> > I made the watchdog expiry a millisecond value (in hal_watchdog_init()), as pretty much every watchdog I’ve seen executes in millisecond resolution. I removed the hal_watchdog_stop() function, as many processors don’t offer the ability to stop the watchdog once started.
>> >
>> > I also hooked the watchdog into the OS with two new configuration options:
>> >
>> > SANITY_INTERVAL:
>> > description: 'The interval at which the sanity checks should run, should be at least 200ms prior to watchdog'
>> > value: 299500
>> > WATCHDOG_INTERVAL:
>> > description: 'The interval at which the watchdog should reset if not tickled, in ms'
>> > value: 300000
>> >
>> > These are default values, defined by the OS (libs/os/pkg.yml), that can be overridden by the BSP.
>> >
>> > By default the OS initializes the watchdog with hal_watchdog_init() when OS start is called. I have removed the sanity task (to save stack space, and make it run by default), and instead, put all sanity related functions within the idle task.
>> >
>> > The logic is here (os.c), for reference. os_sanity_run() calls into os_sanity.c, and runs the sanity checks. Code has also been added to make sure that we don’t sleep beyond the sanity interval, and trip up the watchdog in our idle loop.
>> >
>> > sanity_itvl_ticks = (MYNEWT_VAL(SANITY_INTERVAL) * OS_TICKS_PER_SEC) / 1000;
>> > sanity_last = 0;
>> >
>> > hal_watchdog_tickle();
>> >
>> > while (1) {
>> > ++g_os_idle_ctr;
>> >
>> > now = os_time_get();
>> > if (OS_TIME_TICK_GT(now, sanity_last + sanity_itvl_ticks)) {
>> > os_sanity_run();
>> > /* Tickle the watchdog after successfully running sanity */
>> > hal_watchdog_tickle();
>> > sanity_last = now;
>> > }
>> >
>> > OS_ENTER_CRITICAL(sr);
>> > now = os_time_get();
>> > sticks = os_sched_wakeup_ticks(now);
>> > cticks = os_callout_wakeup_ticks(now);
>> > iticks = min(sticks, cticks);
>> > /* Wakeup in time to run sanity as well from the idle context,
>> > * as the idle task does not schedule itself.
>> > */
>> > iticks = min(iticks, ((sanity_last + sanity_itvl_ticks) - now));
>> >
>> > if (iticks < MIN_IDLE_TICKS) {
>> > iticks = 0;
>> > } else if (iticks > MAX_IDLE_TICKS) {
>> > iticks = MAX_IDLE_TICKS;
>> > } else {
>> > /* NOTHING */
>> > }
>> > /* Tell the architecture specific support to put the processor to sleep
>> > * for 'n' ticks.
>> > */
>> > os_tick_idle(iticks);
>> > OS_EXIT_CRITICAL(sr);
>> >
>> >
>> > For BSPs where hal_watchdog has not been implemented, calling these functions has no effect, and I’ve added stubs to the stm32f4 and native BSPs.
>> >
>> > Sterling
>> >
>> >
>> > On 30 Aug 2016, at 15:01, marko kiiskila wrote:
>> >
>> >>
>> >>> On Aug 30, 2016, at 12:59 PM, Mathew Calmer <mc...@exploramednc7.com> <mailto:mcalmer@exploramednc7.com%3E> wrote:
>> >>>
>> >>> On August 30, 2016 at 12:28:50 PM, will sanfilippo (wills@runtime.io <ma...@runtime.io> <mailto:wills@runtime.io%3E>) wrote:
>> >>> Sounds reasonable. As I am sure you know, doing it through the sanity task sometimes is an issue getting the time right as you would then need to know the worst-case timing of all the tasks that could be running… but any way you cut it, you have to put some time limit on that… in past lives I have seen some pretty complicated ways to deal with this but this seems reasonable and if developers need something different they can implement it with this hal.
>> >>>
>> >>>
>> >>>
>> >>> I would consider making the return value of init() be the time or some reference to the time that was actually set. So, for example, if the user asks for 10000 ticks, and the system can only support 2000, it could return 2000 from init, after trying it’s best to support the request.
>> >>>
>> >>> It could be done in powers of two or some other mechanism, but conceptually using that return value to explain what was actually set would be a nice interface. If watchdog was not implemented on given hardware, default return could be negative (error) or 0, implying watchdog was not set (although 0 also implies success… so…).
>> >>>
>> >>
>> >> I was thinking the same regarding return value from init(), but had not written it down in the
>> >> API proposal.
>> >>
>> >> I’m including updated version here.
>> >>
>> >> /*
>> >> * Set a recurring watchdog timer to fire no sooner than in 'expire_secs'
>> >> * seconds. Watchdog should be tickled periodically with a frequency
>> >> * smaller than 'expire_secs'. Watchdog needs to be then started with
>> >> * a call to hal_watchdog_enable().
>> >> *
>> >> * @param expire_secs Watchdog timer expiration time
>> >> *
>> >> * @return < 0 on failure; on success return the actual
>> >> * expiration time as positive value
>> >> */
>> >> int hal_watchdog_init(int expire_secs);
>> >>
>> >> /*
>> >> * Starts the watchdog.
>> >> *
>> >> * @return 0 on success; non-zero on failure.
>> >> */
>> >> int hal_watchdog_enable(void);
>> >>
>> >> /*
>> >> * Stops the watchdog.
>> >> *
>> >> * @return 0 on success; non-zero on failure.
>> >> */
>> >> int hal_watchdog_stop(void);
>> >>
>> >> /*
>> >> * Tickles the watchdog.
>> >> */
>> >> void hal_watchdog_tickle(void);
>> >>

Re: hal watchdog

Posted by Mathew Calmer <mc...@exploramednc7.com>.

I also worry (slightly) that the default time between sanity waking up and the watchdog firing may not be enough. It is configurable so that is all good, but maybe the default should be a bit longer as sanity is checked in the idle task and if a system has lots of tasks the idle task may not run for a bit (although 200 msecs is a while).

Feels to me that if your idle task is taking longer than 200 msecs to circle, you probably _want_ your watchdog kicking in.

200 ms is really long.


> On Sep 22, 2016, at 10:58 PM, Sterling Hughes <st...@apache.org><mailto:sterling@apache.org%3E> wrote:
>
> Hey —
>
> A follow up on this, I’ve committed initial support for the Nordic platforms for the watchdog, along with modifying this API a bit.
>
> I made the watchdog expiry a millisecond value (in hal_watchdog_init()), as pretty much every watchdog I’ve seen executes in millisecond resolution. I removed the hal_watchdog_stop() function, as many processors don’t offer the ability to stop the watchdog once started.
>
> I also hooked the watchdog into the OS with two new configuration options:
>
> SANITY_INTERVAL:
> description: 'The interval at which the sanity checks should run, should be at least 200ms prior to watchdog'
> value: 299500
> WATCHDOG_INTERVAL:
> description: 'The interval at which the watchdog should reset if not tickled, in ms'
> value: 300000
>
> These are default values, defined by the OS (libs/os/pkg.yml), that can be overridden by the BSP.
>
> By default the OS initializes the watchdog with hal_watchdog_init() when OS start is called. I have removed the sanity task (to save stack space, and make it run by default), and instead, put all sanity related functions within the idle task.
>
> The logic is here (os.c), for reference. os_sanity_run() calls into os_sanity.c, and runs the sanity checks. Code has also been added to make sure that we don’t sleep beyond the sanity interval, and trip up the watchdog in our idle loop.
>
> sanity_itvl_ticks = (MYNEWT_VAL(SANITY_INTERVAL) * OS_TICKS_PER_SEC) / 1000;
> sanity_last = 0;
>
> hal_watchdog_tickle();
>
> while (1) {
> ++g_os_idle_ctr;
>
> now = os_time_get();
> if (OS_TIME_TICK_GT(now, sanity_last + sanity_itvl_ticks)) {
> os_sanity_run();
> /* Tickle the watchdog after successfully running sanity */
> hal_watchdog_tickle();
> sanity_last = now;
> }
>
> OS_ENTER_CRITICAL(sr);
> now = os_time_get();
> sticks = os_sched_wakeup_ticks(now);
> cticks = os_callout_wakeup_ticks(now);
> iticks = min(sticks, cticks);
> /* Wakeup in time to run sanity as well from the idle context,
> * as the idle task does not schedule itself.
> */
> iticks = min(iticks, ((sanity_last + sanity_itvl_ticks) - now));
>
> if (iticks < MIN_IDLE_TICKS) {
> iticks = 0;
> } else if (iticks > MAX_IDLE_TICKS) {
> iticks = MAX_IDLE_TICKS;
> } else {
> /* NOTHING */
> }
> /* Tell the architecture specific support to put the processor to sleep
> * for 'n' ticks.
> */
> os_tick_idle(iticks);
> OS_EXIT_CRITICAL(sr);
>
>
> For BSPs where hal_watchdog has not been implemented, calling these functions has no effect, and I’ve added stubs to the stm32f4 and native BSPs.
>
> Sterling
>
>
> On 30 Aug 2016, at 15:01, marko kiiskila wrote:
>
>>
>>> On Aug 30, 2016, at 12:59 PM, Mathew Calmer <mc...@exploramednc7.com><mailto:mcalmer@exploramednc7.com%3E> wrote:
>>>
>>> On August 30, 2016 at 12:28:50 PM, will sanfilippo (wills@runtime.io<ma...@runtime.io><mailto:wills@runtime.io%3E>) wrote:
>>> Sounds reasonable. As I am sure you know, doing it through the sanity task sometimes is an issue getting the time right as you would then need to know the worst-case timing of all the tasks that could be running… but any way you cut it, you have to put some time limit on that… in past lives I have seen some pretty complicated ways to deal with this but this seems reasonable and if developers need something different they can implement it with this hal.
>>>
>>>
>>>
>>> I would consider making the return value of init() be the time or some reference to the time that was actually set. So, for example, if the user asks for 10000 ticks, and the system can only support 2000, it could return 2000 from init, after trying it’s best to support the request.
>>>
>>> It could be done in powers of two or some other mechanism, but conceptually using that return value to explain what was actually set would be a nice interface. If watchdog was not implemented on given hardware, default return could be negative (error) or 0, implying watchdog was not set (although 0 also implies success… so…).
>>>
>>
>> I was thinking the same regarding return value from init(), but had not written it down in the
>> API proposal.
>>
>> I’m including updated version here.
>>
>> /*
>> * Set a recurring watchdog timer to fire no sooner than in 'expire_secs'
>> * seconds. Watchdog should be tickled periodically with a frequency
>> * smaller than 'expire_secs'. Watchdog needs to be then started with
>> * a call to hal_watchdog_enable().
>> *
>> * @param expire_secs Watchdog timer expiration time
>> *
>> * @return < 0 on failure; on success return the actual
>> * expiration time as positive value
>> */
>> int hal_watchdog_init(int expire_secs);
>>
>> /*
>> * Starts the watchdog.
>> *
>> * @return 0 on success; non-zero on failure.
>> */
>> int hal_watchdog_enable(void);
>>
>> /*
>> * Stops the watchdog.
>> *
>> * @return 0 on success; non-zero on failure.
>> */
>> int hal_watchdog_stop(void);
>>
>> /*
>> * Tickles the watchdog.
>> */
>> void hal_watchdog_tickle(void);
>>

________________________________

Re: hal watchdog

Posted by will sanfilippo <wi...@runtime.io>.

Oh yeah, sorry. I dont know why I said 200 when it is clearly 500 :-) Thanks for pointing that out.

All good about the msec v sec thing.


> On Sep 23, 2016, at 11:02 AM, Sterling Hughes <st...@apache.org> wrote:
> 
> 2 reasons:
> 
> - most (all) chips provide that granularity that i’ve seen, so why not?  IMO, it’s not up for us to decide what people set watchdog interval to in the HAL if it’s portable.
> 
> - as you point out, it’s convenient to have a consistent timebase between sanity and watchdog, given how i’ve staggered the wakeup/watchdog tickle.
> 
> I worried about that too, BTW: the default is 500msecs, I just assert if it’s less than 200 msecs in os.c
> 
> sterling
> 
> On 23 Sep 2016, at 10:51, will sanfilippo wrote:
> 
>> Why is the interval defined in milliseconds btw? Is there a particular reason for it? Is it because you wanted to be able to separate the sanity interval and the watchdog interval by less than one second? Or are you worried that some watchdogs may have very small timeouts and milliseconds would be useful? Just curious… and you can probably tell from my curiosity that seconds seems to be enough resolution but this is really no big deal.
>> 
>> I also worry (slightly) that the default time between sanity waking up and the watchdog firing may not be enough. It is configurable so that is all good, but maybe the default should be a bit longer as sanity is checked in the idle task and if a system has lots of tasks the idle task may not run for a bit (although 200 msecs is a while).
>> 
>>> On Sep 22, 2016, at 10:58 PM, Sterling Hughes <st...@apache.org> wrote:
>>> 
>>> Hey —
>>> 
>>> A follow up on this, I’ve committed initial support for the Nordic platforms for the watchdog, along with modifying this API a bit.
>>> 
>>> I made the watchdog expiry a millisecond value (in hal_watchdog_init()), as pretty much every watchdog I’ve seen executes in millisecond resolution.  I removed the hal_watchdog_stop() function, as many processors don’t offer the ability to stop the watchdog once started.
>>> 
>>> I also hooked the watchdog into the OS with two new configuration options:
>>> 
>>>   SANITY_INTERVAL:
>>>       description: 'The interval at which the sanity checks should run, should be at least 200ms prior to watchdog'
>>>       value: 299500
>>>   WATCHDOG_INTERVAL:
>>>       description: 'The interval at which the watchdog should reset if not tickled, in ms'
>>>       value: 300000
>>> 
>>> These are default values, defined by the OS (libs/os/pkg.yml), that can be overridden by the BSP.
>>> 
>>> By default the OS initializes the watchdog with hal_watchdog_init() when OS start is called.  I have removed the sanity task (to save stack space, and make it run by default), and instead, put all sanity related functions within the idle task.
>>> 
>>> The logic is here (os.c), for reference.  os_sanity_run() calls into os_sanity.c, and runs the sanity checks.  Code has also been added to make sure that we don’t sleep beyond the sanity interval, and trip up the watchdog in our idle loop.
>>> 
>>>   sanity_itvl_ticks = (MYNEWT_VAL(SANITY_INTERVAL) * OS_TICKS_PER_SEC) / 1000;
>>>   sanity_last = 0;
>>> 
>>>   hal_watchdog_tickle();
>>> 
>>>   while (1) {
>>>       ++g_os_idle_ctr;
>>> 
>>>       now = os_time_get();
>>>       if (OS_TIME_TICK_GT(now, sanity_last + sanity_itvl_ticks)) {
>>>           os_sanity_run();
>>>           /* Tickle the watchdog after successfully running sanity */
>>>           hal_watchdog_tickle();
>>>           sanity_last = now;
>>>       }
>>> 
>>>       OS_ENTER_CRITICAL(sr);
>>>       now = os_time_get();
>>>       sticks = os_sched_wakeup_ticks(now);
>>>       cticks = os_callout_wakeup_ticks(now);
>>>       iticks = min(sticks, cticks);
>>>       /* Wakeup in time to run sanity as well from the idle context,
>>>        * as the idle task does not schedule itself.
>>>        */
>>>       iticks = min(iticks, ((sanity_last + sanity_itvl_ticks) - now));
>>> 
>>>       if (iticks < MIN_IDLE_TICKS) {
>>>           iticks = 0;
>>>       } else if (iticks > MAX_IDLE_TICKS) {
>>>           iticks = MAX_IDLE_TICKS;
>>>       } else {
>>>           /* NOTHING */
>>>       }
>>>       /* Tell the architecture specific support to put the processor to sleep
>>>        * for 'n' ticks.
>>>        */
>>>       os_tick_idle(iticks);
>>>       OS_EXIT_CRITICAL(sr);
>>> 
>>> 
>>> For BSPs where hal_watchdog has not been implemented, calling these functions has no effect, and I’ve added stubs to the stm32f4 and native BSPs.
>>> 
>>> Sterling
>>> 
>>> 
>>> On 30 Aug 2016, at 15:01, marko kiiskila wrote:
>>> 
>>>> 
>>>>> On Aug 30, 2016, at 12:59 PM, Mathew Calmer <mc...@exploramednc7.com> wrote:
>>>>> 
>>>>> On August 30, 2016 at 12:28:50 PM, will sanfilippo (wills@runtime.io<ma...@runtime.io>) wrote:
>>>>> Sounds reasonable. As I am sure you know, doing it through the sanity task sometimes is an issue getting the time right as you would then need to know the worst-case timing of all the tasks that could be running… but any way you cut it, you have to put some time limit on that… in past lives I have seen some pretty complicated ways to deal with this but this seems reasonable and if developers need something different they can implement it with this hal.
>>>>> 
>>>>> 
>>>>> 
>>>>> I would consider making the return value of init() be the time or some reference to the time that was actually set.  So, for example, if the user asks for 10000 ticks, and the system can only support 2000, it could return 2000 from init, after trying it’s best to support the request.
>>>>> 
>>>>> It could be done in powers of two or some other mechanism, but conceptually using that return value to explain what was actually set would be a nice interface.   If watchdog was not implemented on given hardware, default return could be negative (error) or 0, implying watchdog was not set (although 0 also implies success… so…).
>>>>> 
>>>> 
>>>> I was thinking the same regarding return value from init(), but had not written it down in the
>>>> API proposal.
>>>> 
>>>> I’m including updated version here.
>>>> 
>>>> /*
>>>> * Set a recurring watchdog timer to fire no sooner than in 'expire_secs'
>>>> * seconds. Watchdog should be tickled periodically with a frequency
>>>> * smaller than 'expire_secs'. Watchdog needs to be then started with
>>>> * a call to hal_watchdog_enable().
>>>> *
>>>> * @param expire_secs           Watchdog timer expiration time
>>>> *
>>>> * @return                      < 0 on failure; on success return the actual
>>>> *                              expiration time as positive value
>>>> */
>>>> int hal_watchdog_init(int expire_secs);
>>>> 
>>>> /*
>>>> * Starts the watchdog.
>>>> *
>>>> * @return                      0 on success; non-zero on failure.
>>>> */
>>>> int hal_watchdog_enable(void);
>>>> 
>>>> /*
>>>> * Stops the watchdog.
>>>> *
>>>> * @return                      0 on success; non-zero on failure.
>>>> */
>>>> int hal_watchdog_stop(void);
>>>> 
>>>> /*
>>>> * Tickles the watchdog.
>>>> */
>>>> void hal_watchdog_tickle(void);
>>>> 
>>

Re: hal watchdog

Posted by Sterling Hughes <st...@apache.org>.

2 reasons:

- most (all) chips provide that granularity that i\u2019ve seen, so why 
not?  IMO, it\u2019s not up for us to decide what people set watchdog 
interval to in the HAL if it\u2019s portable.

- as you point out, it\u2019s convenient to have a consistent timebase 
between sanity and watchdog, given how i\u2019ve staggered the 
wakeup/watchdog tickle.

I worried about that too, BTW: the default is 500msecs, I just assert if 
it\u2019s less than 200 msecs in os.c

sterling

On 23 Sep 2016, at 10:51, will sanfilippo wrote:

> Why is the interval defined in milliseconds btw? Is there a particular 
> reason for it? Is it because you wanted to be able to separate the 
> sanity interval and the watchdog interval by less than one second? Or 
> are you worried that some watchdogs may have very small timeouts and 
> milliseconds would be useful? Just curious\u2026 and you can probably 
> tell from my curiosity that seconds seems to be enough resolution but 
> this is really no big deal.
>
> I also worry (slightly) that the default time between sanity waking up 
> and the watchdog firing may not be enough. It is configurable so that 
> is all good, but maybe the default should be a bit longer as sanity is 
> checked in the idle task and if a system has lots of tasks the idle 
> task may not run for a bit (although 200 msecs is a while).
>
>> On Sep 22, 2016, at 10:58 PM, Sterling Hughes <st...@apache.org> 
>> wrote:
>>
>> Hey \u2014
>>
>> A follow up on this, I\u2019ve committed initial support for the Nordic 
>> platforms for the watchdog, along with modifying this API a bit.
>>
>> I made the watchdog expiry a millisecond value (in 
>> hal_watchdog_init()), as pretty much every watchdog I\u2019ve seen 
>> executes in millisecond resolution.  I removed the 
>> hal_watchdog_stop() function, as many processors don\u2019t offer the 
>> ability to stop the watchdog once started.
>>
>> I also hooked the watchdog into the OS with two new configuration 
>> options:
>>
>>    SANITY_INTERVAL:
>>        description: 'The interval at which the sanity checks should 
>> run, should be at least 200ms prior to watchdog'
>>        value: 299500
>>    WATCHDOG_INTERVAL:
>>        description: 'The interval at which the watchdog should reset 
>> if not tickled, in ms'
>>        value: 300000
>>
>> These are default values, defined by the OS (libs/os/pkg.yml), that 
>> can be overridden by the BSP.
>>
>> By default the OS initializes the watchdog with hal_watchdog_init() 
>> when OS start is called.  I have removed the sanity task (to save 
>> stack space, and make it run by default), and instead, put all sanity 
>> related functions within the idle task.
>>
>> The logic is here (os.c), for reference.  os_sanity_run() calls into 
>> os_sanity.c, and runs the sanity checks.  Code has also been added to 
>> make sure that we don\u2019t sleep beyond the sanity interval, and trip 
>> up the watchdog in our idle loop.
>>
>>    sanity_itvl_ticks = (MYNEWT_VAL(SANITY_INTERVAL) * 
>> OS_TICKS_PER_SEC) / 1000;
>>    sanity_last = 0;
>>
>>    hal_watchdog_tickle();
>>
>>    while (1) {
>>        ++g_os_idle_ctr;
>>
>>        now = os_time_get();
>>        if (OS_TIME_TICK_GT(now, sanity_last + sanity_itvl_ticks)) {
>>            os_sanity_run();
>>            /* Tickle the watchdog after successfully running sanity 
>> */
>>            hal_watchdog_tickle();
>>            sanity_last = now;
>>        }
>>
>>        OS_ENTER_CRITICAL(sr);
>>        now = os_time_get();
>>        sticks = os_sched_wakeup_ticks(now);
>>        cticks = os_callout_wakeup_ticks(now);
>>        iticks = min(sticks, cticks);
>>        /* Wakeup in time to run sanity as well from the idle context,
>>         * as the idle task does not schedule itself.
>>         */
>>        iticks = min(iticks, ((sanity_last + sanity_itvl_ticks) - 
>> now));
>>
>>        if (iticks < MIN_IDLE_TICKS) {
>>            iticks = 0;
>>        } else if (iticks > MAX_IDLE_TICKS) {
>>            iticks = MAX_IDLE_TICKS;
>>        } else {
>>            /* NOTHING */
>>        }
>>        /* Tell the architecture specific support to put the processor 
>> to sleep
>>         * for 'n' ticks.
>>         */
>>        os_tick_idle(iticks);
>>        OS_EXIT_CRITICAL(sr);
>>
>>
>> For BSPs where hal_watchdog has not been implemented, calling these 
>> functions has no effect, and I\u2019ve added stubs to the stm32f4 and 
>> native BSPs.
>>
>> Sterling
>>
>>
>> On 30 Aug 2016, at 15:01, marko kiiskila wrote:
>>
>>>
>>>> On Aug 30, 2016, at 12:59 PM, Mathew Calmer 
>>>> <mc...@exploramednc7.com> wrote:
>>>>
>>>> On August 30, 2016 at 12:28:50 PM, will sanfilippo 
>>>> (wills@runtime.io<ma...@runtime.io>) wrote:
>>>> Sounds reasonable. As I am sure you know, doing it through the 
>>>> sanity task sometimes is an issue getting the time right as you 
>>>> would then need to know the worst-case timing of all the tasks that 
>>>> could be running\u2026 but any way you cut it, you have to put some 
>>>> time limit on that\u2026 in past lives I have seen some pretty 
>>>> complicated ways to deal with this but this seems reasonable and if 
>>>> developers need something different they can implement it with this 
>>>> hal.
>>>>
>>>>
>>>>
>>>> I would consider making the return value of init() be the time or 
>>>> some reference to the time that was actually set.  So, for example, 
>>>> if the user asks for 10000 ticks, and the system can only support 
>>>> 2000, it could return 2000 from init, after trying it\u2019s best to 
>>>> support the request.
>>>>
>>>> It could be done in powers of two or some other mechanism, but 
>>>> conceptually using that return value to explain what was actually 
>>>> set would be a nice interface.   If watchdog was not implemented on 
>>>> given hardware, default return could be negative (error) or 0, 
>>>> implying watchdog was not set (although 0 also implies success\u2026 
>>>> so\u2026).
>>>>
>>>
>>> I was thinking the same regarding return value from init(), but had 
>>> not written it down in the
>>> API proposal.
>>>
>>> I\u2019m including updated version here.
>>>
>>> /*
>>> * Set a recurring watchdog timer to fire no sooner than in 
>>> 'expire_secs'
>>> * seconds. Watchdog should be tickled periodically with a frequency
>>> * smaller than 'expire_secs'. Watchdog needs to be then started with
>>> * a call to hal_watchdog_enable().
>>> *
>>> * @param expire_secs           Watchdog timer expiration time
>>> *
>>> * @return                      < 0 on failure; on success return the 
>>> actual
>>> *                              expiration time as positive value
>>> */
>>> int hal_watchdog_init(int expire_secs);
>>>
>>> /*
>>> * Starts the watchdog.
>>> *
>>> * @return                      0 on success; non-zero on failure.
>>> */
>>> int hal_watchdog_enable(void);
>>>
>>> /*
>>> * Stops the watchdog.
>>> *
>>> * @return                      0 on success; non-zero on failure.
>>> */
>>> int hal_watchdog_stop(void);
>>>
>>> /*
>>> * Tickles the watchdog.
>>> */
>>> void hal_watchdog_tickle(void);
>>>
>

Re: hal watchdog

Posted by will sanfilippo <wi...@runtime.io>.

Why is the interval defined in milliseconds btw? Is there a particular reason for it? Is it because you wanted to be able to separate the sanity interval and the watchdog interval by less than one second? Or are you worried that some watchdogs may have very small timeouts and milliseconds would be useful? Just curious… and you can probably tell from my curiosity that seconds seems to be enough resolution but this is really no big deal.

I also worry (slightly) that the default time between sanity waking up and the watchdog firing may not be enough. It is configurable so that is all good, but maybe the default should be a bit longer as sanity is checked in the idle task and if a system has lots of tasks the idle task may not run for a bit (although 200 msecs is a while).


> On Sep 22, 2016, at 10:58 PM, Sterling Hughes <st...@apache.org> wrote:
> 
> Hey —
> 
> A follow up on this, I’ve committed initial support for the Nordic platforms for the watchdog, along with modifying this API a bit.
> 
> I made the watchdog expiry a millisecond value (in hal_watchdog_init()), as pretty much every watchdog I’ve seen executes in millisecond resolution.  I removed the hal_watchdog_stop() function, as many processors don’t offer the ability to stop the watchdog once started.
> 
> I also hooked the watchdog into the OS with two new configuration options:
> 
>    SANITY_INTERVAL:
>        description: 'The interval at which the sanity checks should run, should be at least 200ms prior to watchdog'
>        value: 299500
>    WATCHDOG_INTERVAL:
>        description: 'The interval at which the watchdog should reset if not tickled, in ms'
>        value: 300000
> 
> These are default values, defined by the OS (libs/os/pkg.yml), that can be overridden by the BSP.
> 
> By default the OS initializes the watchdog with hal_watchdog_init() when OS start is called.  I have removed the sanity task (to save stack space, and make it run by default), and instead, put all sanity related functions within the idle task.
> 
> The logic is here (os.c), for reference.  os_sanity_run() calls into os_sanity.c, and runs the sanity checks.  Code has also been added to make sure that we don’t sleep beyond the sanity interval, and trip up the watchdog in our idle loop.
> 
>    sanity_itvl_ticks = (MYNEWT_VAL(SANITY_INTERVAL) * OS_TICKS_PER_SEC) / 1000;
>    sanity_last = 0;
> 
>    hal_watchdog_tickle();
> 
>    while (1) {
>        ++g_os_idle_ctr;
> 
>        now = os_time_get();
>        if (OS_TIME_TICK_GT(now, sanity_last + sanity_itvl_ticks)) {
>            os_sanity_run();
>            /* Tickle the watchdog after successfully running sanity */
>            hal_watchdog_tickle();
>            sanity_last = now;
>        }
> 
>        OS_ENTER_CRITICAL(sr);
>        now = os_time_get();
>        sticks = os_sched_wakeup_ticks(now);
>        cticks = os_callout_wakeup_ticks(now);
>        iticks = min(sticks, cticks);
>        /* Wakeup in time to run sanity as well from the idle context,
>         * as the idle task does not schedule itself.
>         */
>        iticks = min(iticks, ((sanity_last + sanity_itvl_ticks) - now));
> 
>        if (iticks < MIN_IDLE_TICKS) {
>            iticks = 0;
>        } else if (iticks > MAX_IDLE_TICKS) {
>            iticks = MAX_IDLE_TICKS;
>        } else {
>            /* NOTHING */
>        }
>        /* Tell the architecture specific support to put the processor to sleep
>         * for 'n' ticks.
>         */
>        os_tick_idle(iticks);
>        OS_EXIT_CRITICAL(sr);
> 
> 
> For BSPs where hal_watchdog has not been implemented, calling these functions has no effect, and I’ve added stubs to the stm32f4 and native BSPs.
> 
> Sterling
> 
> 
> On 30 Aug 2016, at 15:01, marko kiiskila wrote:
> 
>> 
>>> On Aug 30, 2016, at 12:59 PM, Mathew Calmer <mc...@exploramednc7.com> wrote:
>>> 
>>> On August 30, 2016 at 12:28:50 PM, will sanfilippo (wills@runtime.io<ma...@runtime.io>) wrote:
>>> Sounds reasonable. As I am sure you know, doing it through the sanity task sometimes is an issue getting the time right as you would then need to know the worst-case timing of all the tasks that could be running… but any way you cut it, you have to put some time limit on that… in past lives I have seen some pretty complicated ways to deal with this but this seems reasonable and if developers need something different they can implement it with this hal.
>>> 
>>> 
>>> 
>>> I would consider making the return value of init() be the time or some reference to the time that was actually set.  So, for example, if the user asks for 10000 ticks, and the system can only support 2000, it could return 2000 from init, after trying it’s best to support the request.
>>> 
>>> It could be done in powers of two or some other mechanism, but conceptually using that return value to explain what was actually set would be a nice interface.   If watchdog was not implemented on given hardware, default return could be negative (error) or 0, implying watchdog was not set (although 0 also implies success… so…).
>>> 
>> 
>> I was thinking the same regarding return value from init(), but had not written it down in the
>> API proposal.
>> 
>> I’m including updated version here.
>> 
>> /*
>> * Set a recurring watchdog timer to fire no sooner than in 'expire_secs'
>> * seconds. Watchdog should be tickled periodically with a frequency
>> * smaller than 'expire_secs'. Watchdog needs to be then started with
>> * a call to hal_watchdog_enable().
>> *
>> * @param expire_secs           Watchdog timer expiration time
>> *
>> * @return                      < 0 on failure; on success return the actual
>> *                              expiration time as positive value
>> */
>> int hal_watchdog_init(int expire_secs);
>> 
>> /*
>> * Starts the watchdog.
>> *
>> * @return                      0 on success; non-zero on failure.
>> */
>> int hal_watchdog_enable(void);
>> 
>> /*
>> * Stops the watchdog.
>> *
>> * @return                      0 on success; non-zero on failure.
>> */
>> int hal_watchdog_stop(void);
>> 
>> /*
>> * Tickles the watchdog.
>> */
>> void hal_watchdog_tickle(void);
>>