You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@whimsical.apache.org by sebb <se...@gmail.com> on 2016/02/11 02:23:57 UTC

Latest updates file for public json data files

Most of the public json data files have dates in them that show when
the raw data was last updated.
The files themselves are only replaced when the data in them changes,
so the If-Modified-Since check works properly.

Now some data sources change quite infrequently, so it can be hard to
determine of the file really is up to date or the cron job has not
been run recently.

There are public log files, however these would need to be analysed to
determine if the job completed OK, and even an empty file could be
associated with a failed job if the error was not caught and logged -
i.e. it only went to stderr]

Possible solutions might be:
- scripts add a standard entry to each log file on successful completion.
This could include the start and end times of the job.
Quite simple to implement, but a bit awkward to use.

- scripts update a shared json file on successful completion.
More care needed to implement (locking needed), but much easier to use.

Thoughts?

Re: Latest updates file for public json data files

Posted by Sam Ruby <ru...@intertwingly.net>.
On Thu, Feb 11, 2016 at 12:57 PM, sebb <se...@gmail.com> wrote:
>
> As it stands, if one or more cron jobs stop running, I don't think PMB
> will notice unless the last run happened to fail.

https://github.com/apache/whimsy/commit/d8734785283e842f7ecd804dae8d0a22230d19c5

Feel free to tweak, revert, replace, or whatever.

- Sam Ruby

Re: Latest updates file for public json data files

Posted by sebb <se...@gmail.com>.
On 11 February 2016 at 17:34, Sam Ruby <ru...@intertwingly.net> wrote:
> On Thu, Feb 11, 2016 at 10:58 AM, sebb <se...@gmail.com> wrote:
>> On 11 February 2016 at 01:53, Sam Ruby <ru...@intertwingly.net> wrote:
>>> On Wed, Feb 10, 2016 at 8:23 PM, sebb <se...@gmail.com> wrote:
>>>> Most of the public json data files have dates in them that show when
>>>> the raw data was last updated.
>>>> The files themselves are only replaced when the data in them changes,
>>>> so the If-Modified-Since check works properly.
>>>>
>>>> Now some data sources change quite infrequently, so it can be hard to
>>>> determine of the file really is up to date or the cron job has not
>>>> been run recently.
>>>>
>>>> There are public log files, however these would need to be analysed to
>>>> determine if the job completed OK, and even an empty file could be
>>>> associated with a failed job if the error was not caught and logged -
>>>> i.e. it only went to stderr]
>>>
>>> The cron jobs direct stderr to the log files (2>&1):
>>>
>>> https://github.com/apache/infrastructure-puppet/blob/deployment/modules/whimsy_server/manifests/cronjobs.pp
>>
>> OK.
>>
>> But the issue still arises - the contents of the log file have to be
>> analysed in order to determine if the output was created OK.
>> However if the job added a marker that would be trivial.
>
> I'm clearly not understanding what problem you are trying to solve.

So that automated scripts can provide info on when the data was last updated.

> Yes, you have to look at the contents to determine if the job ran
> successfully.  Adding a marker to the contents doesn't eliminate the
> need to look at the contents.

No, but it's a lot easier to check for a specific text marker.

> Example problem #1: job didn't run at all.  Symptom to look for: no
> updated mtime.
>
> Example problem #2: job ran and blew up.  Symptom to look for:
> unexpected output produced.

Detecting the unexpected is probably fairly easy for a human who has
some idea what the script does, but not necessarily easy to define to
an automated script.
Or even for a human who is not au fait with the app.

> But whatever, if you want to add a marker, go for it.

Would you be OK with a combined JSON status file instead?

>>>> Possible solutions might be:
>>>> - scripts add a standard entry to each log file on successful completion.
>>>> This could include the start and end times of the job.
>>>> Quite simple to implement, but a bit awkward to use.
>>>>
>>>> - scripts update a shared json file on successful completion.
>>>> More care needed to implement (locking needed), but much easier to use.
>>>>
>>>> Thoughts?
>>>
>>> The log files themselves have an mtime: https://whimsy-test.apache.org/logs/
>>
>> Yes, but that only shows when cron last ran the job, not when it last
>> completed OK.
>>
>> Also there log file names don't agree with the json files (that could
>> easily be fixed).
>>
>>> status/monitors/public_json.rb could issue a warning (or higher?) if
>>> the log has not been updated recently.
>>
>> Is the monitor itself monitored?
>
> Again, I may not be understanding the question.  It certainly is:
>
> https://www.pingmybox.com/dashboard?location=470

I was forgetting that PMB invokes a cgi, and that invokes the status scripts.

As opposed to the status scripts running as a cron job, the output of
which is examined.

>>> Depending on the level chosen, this could trigger an alert.
>>
>> That is a good idea.
>
> :-)

As it stands, if one or more cron jobs stop running, I don't think PMB
will notice unless the last run happened to fail.

>
> - Sam Ruby

Re: Latest updates file for public json data files

Posted by Sam Ruby <ru...@intertwingly.net>.
On Thu, Feb 11, 2016 at 10:58 AM, sebb <se...@gmail.com> wrote:
> On 11 February 2016 at 01:53, Sam Ruby <ru...@intertwingly.net> wrote:
>> On Wed, Feb 10, 2016 at 8:23 PM, sebb <se...@gmail.com> wrote:
>>> Most of the public json data files have dates in them that show when
>>> the raw data was last updated.
>>> The files themselves are only replaced when the data in them changes,
>>> so the If-Modified-Since check works properly.
>>>
>>> Now some data sources change quite infrequently, so it can be hard to
>>> determine of the file really is up to date or the cron job has not
>>> been run recently.
>>>
>>> There are public log files, however these would need to be analysed to
>>> determine if the job completed OK, and even an empty file could be
>>> associated with a failed job if the error was not caught and logged -
>>> i.e. it only went to stderr]
>>
>> The cron jobs direct stderr to the log files (2>&1):
>>
>> https://github.com/apache/infrastructure-puppet/blob/deployment/modules/whimsy_server/manifests/cronjobs.pp
>
> OK.
>
> But the issue still arises - the contents of the log file have to be
> analysed in order to determine if the output was created OK.
> However if the job added a marker that would be trivial.

I'm clearly not understanding what problem you are trying to solve.
Yes, you have to look at the contents to determine if the job ran
successfully.  Adding a marker to the contents doesn't eliminate the
need to look at the contents.

Example problem #1: job didn't run at all.  Symptom to look for: no
updated mtime.

Example problem #2: job ran and blew up.  Symptom to look for:
unexpected output produced.

But whatever, if you want to add a marker, go for it.

>>> Possible solutions might be:
>>> - scripts add a standard entry to each log file on successful completion.
>>> This could include the start and end times of the job.
>>> Quite simple to implement, but a bit awkward to use.
>>>
>>> - scripts update a shared json file on successful completion.
>>> More care needed to implement (locking needed), but much easier to use.
>>>
>>> Thoughts?
>>
>> The log files themselves have an mtime: https://whimsy-test.apache.org/logs/
>
> Yes, but that only shows when cron last ran the job, not when it last
> completed OK.
>
> Also there log file names don't agree with the json files (that could
> easily be fixed).
>
>> status/monitors/public_json.rb could issue a warning (or higher?) if
>> the log has not been updated recently.
>
> Is the monitor itself monitored?

Again, I may not be understanding the question.  It certainly is:

https://www.pingmybox.com/dashboard?location=470

>> Depending on the level chosen, this could trigger an alert.
>
> That is a good idea.

:-)

- Sam Ruby

Re: Latest updates file for public json data files

Posted by sebb <se...@gmail.com>.
On 11 February 2016 at 01:53, Sam Ruby <ru...@intertwingly.net> wrote:
> On Wed, Feb 10, 2016 at 8:23 PM, sebb <se...@gmail.com> wrote:
>> Most of the public json data files have dates in them that show when
>> the raw data was last updated.
>> The files themselves are only replaced when the data in them changes,
>> so the If-Modified-Since check works properly.
>>
>> Now some data sources change quite infrequently, so it can be hard to
>> determine of the file really is up to date or the cron job has not
>> been run recently.
>>
>> There are public log files, however these would need to be analysed to
>> determine if the job completed OK, and even an empty file could be
>> associated with a failed job if the error was not caught and logged -
>> i.e. it only went to stderr]
>
> The cron jobs direct stderr to the log files (2>&1):
>
> https://github.com/apache/infrastructure-puppet/blob/deployment/modules/whimsy_server/manifests/cronjobs.pp

OK.

But the issue still arises - the contents of the log file have to be
analysed in order to determine if the output was created OK.
However if the job added a marker that would be trivial.

>> Possible solutions might be:
>> - scripts add a standard entry to each log file on successful completion.
>> This could include the start and end times of the job.
>> Quite simple to implement, but a bit awkward to use.
>>
>> - scripts update a shared json file on successful completion.
>> More care needed to implement (locking needed), but much easier to use.
>>
>> Thoughts?
>
> The log files themselves have an mtime: https://whimsy-test.apache.org/logs/

Yes, but that only shows when cron last ran the job, not when it last
completed OK.

Also there log file names don't agree with the json files (that could
easily be fixed).

> status/monitors/public_json.rb could issue a warning (or higher?) if
> the log has not been updated recently.

Is the monitor itself monitored?

> Depending on the level chosen, this could trigger an alert.

That is a good idea.

> - Sam Ruby

Re: Latest updates file for public json data files

Posted by Sam Ruby <ru...@intertwingly.net>.
On Wed, Feb 10, 2016 at 8:23 PM, sebb <se...@gmail.com> wrote:
> Most of the public json data files have dates in them that show when
> the raw data was last updated.
> The files themselves are only replaced when the data in them changes,
> so the If-Modified-Since check works properly.
>
> Now some data sources change quite infrequently, so it can be hard to
> determine of the file really is up to date or the cron job has not
> been run recently.
>
> There are public log files, however these would need to be analysed to
> determine if the job completed OK, and even an empty file could be
> associated with a failed job if the error was not caught and logged -
> i.e. it only went to stderr]

The cron jobs direct stderr to the log files (2>&1):

https://github.com/apache/infrastructure-puppet/blob/deployment/modules/whimsy_server/manifests/cronjobs.pp

> Possible solutions might be:
> - scripts add a standard entry to each log file on successful completion.
> This could include the start and end times of the job.
> Quite simple to implement, but a bit awkward to use.
>
> - scripts update a shared json file on successful completion.
> More care needed to implement (locking needed), but much easier to use.
>
> Thoughts?

The log files themselves have an mtime: https://whimsy-test.apache.org/logs/

status/monitors/public_json.rb could issue a warning (or higher?) if
the log has not been updated recently.

Depending on the level chosen, this could trigger an alert.

- Sam Ruby