You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@whimsical.apache.org by Sam Ruby <ru...@intertwingly.net> on 2017/06/12 23:55:12 UTC

Re: [whimsy] 02/02: Ensure svn is up to date when generating public_podlings.json.

On Mon, Jun 12, 2017 at 7:44 PM,  <jo...@apache.org> wrote:
> ---
>  lib/whimsy/asf/svn.rb         | 11 +++++++++++
>  www/roster/public_podlings.rb |  7 ++++++-
>  2 files changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/lib/whimsy/asf/svn.rb b/lib/whimsy/asf/svn.rb
> index 134609c..64a596e 100644
> --- a/lib/whimsy/asf/svn.rb
> +++ b/lib/whimsy/asf/svn.rb
> @@ -141,6 +141,17 @@ module ASF
>        return revision, content
>      end
>
> +    def self.updateSimple(path)
> +      cmd = ['svn', 'update', path, '--non-interactive']

This will undoubtedly fail as the $apache::user (www-data) does not
have write access to those directories.

https://github.com/apache/infrastructure-puppet/blob/deployment/modules/whimsy_server/manifests/cronjobs.pp#L89

https://github.com/apache/infrastructure-puppet/blob/deployment/modules/whimsy_server/manifests/init.pp#L116

- Sam Ruby

Re: [whimsy] 02/02: Ensure svn is up to date when generating public_podlings.json.

Posted by sebb <se...@gmail.com>.
On 13 June 2017 at 00:59, John D. Ament <jo...@apache.org> wrote:
> On Mon, Jun 12, 2017 at 7:55 PM Sam Ruby <ru...@intertwingly.net> wrote:
>
>> On Mon, Jun 12, 2017 at 7:44 PM,  <jo...@apache.org> wrote:
>> > ---
>> >  lib/whimsy/asf/svn.rb         | 11 +++++++++++
>> >  www/roster/public_podlings.rb |  7 ++++++-
>> >  2 files changed, 17 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/lib/whimsy/asf/svn.rb b/lib/whimsy/asf/svn.rb
>> > index 134609c..64a596e 100644
>> > --- a/lib/whimsy/asf/svn.rb
>> > +++ b/lib/whimsy/asf/svn.rb
>> > @@ -141,6 +141,17 @@ module ASF
>> >        return revision, content
>> >      end
>> >
>> > +    def self.updateSimple(path)
>> > +      cmd = ['svn', 'update', path, '--non-interactive']
>>
>> This will undoubtedly fail as the $apache::user (www-data) does not
>> have write access to those directories.
>>
>
> Err so should we run cron as whimsysvn ?
>

If it's important to keep the files uptodate, why not use svnpubsub
instead of cron for these files?

>>
>>
>> https://github.com/apache/infrastructure-puppet/blob/deployment/modules/whimsy_server/manifests/cronjobs.pp#L89
>>
>>
>> https://github.com/apache/infrastructure-puppet/blob/deployment/modules/whimsy_server/manifests/init.pp#L116
>>
>> - Sam Ruby
>>

Re: Maintenance & docs (was: Ensure svn is up to date when generating public_podlings.json.)

Posted by sebb <se...@gmail.com>.
On 13 June 2017 at 13:31, Shane Curcuru <as...@shanecurcuru.org> wrote:
> Sam Ruby wrote on 6/12/17 9:24 PM:
>> On Mon, Jun 12, 2017 at 9:06 PM, Sam Ruby <ru...@intertwingly.net> wrote:
> ...snip...
>>> I learned all this the hard way on the original whimsy_vm where
>>> directories often got 'wedged' and needed manual intervention for
>>> cleanup.  That's why I instituted a hard separation between what can
>>> be updated in each process.
>>
>> Adding to my answer: this decision (which can be changed if that what
>> we collectively want to do) was to prefer slightly stale data over
>> data that (at best) might occasionally stop updating, and (at worst)
>> can become corrupt.
>>
>> The /srv/svn files update every 10 minutes.  For most purposes, that
>> is fast enough.
>
> General comments:
>
> - This is a per-tool decision.
>
> - We need to ensure each tool has clear maintenance documentation, so
> fixing out of date or bogus data is easy.
>
> - We need to start thinking about how to consistently document these
> kinds of things to our users, since the userbase is increasing for many
> tools (and overall number of tools total!)
>
> -- Many tools are read-only visualizations of data.  Useful information
> is: what does this data mean, how recently was it updated, and where
> specifically did it come from (see also test/dataflow.cgi)
>
> -- Read/Write tools should consider explaining how quickly changes take
> effect, as well as any auth questions: i.e. are there any tools that
> have separate auth from being able to load/see the tool vs. being able
> to change any editable data?
>
> -- Some tools should probably explicitly note in the About This Script
> that they are access-protected.  This will help remind members to not
> share links for some of the member-private data pages, for example.

AFAIK the URLs don't contain any secret information, and don't grant
access without further auth.
Whimsy does not use tokens embedded in the URLs.

> It's not obvious in many cases to users which specific bits of data
> might be member-private vs. committer-private, I think.

However screen shots could contain private info.

> Any other "explain to the user what this page is" aspects to cover?
>

From a developer point of view it would be good if the pages could
identify their source files.
There's not always unique text that can be grepped to find the source.

This could be done as comments embedded in the HTML or Javascript.
This could be derived from the __FILE__ constant.
It's trivial in _html blocks, but elsewhere I could not find how to
generate comments.
For example, _{'text'} does not seem to work everywhere.

>>
>> Programs like the board agenda tool, the secretary mail tool, and now
>> the roster take great care to update svn in separate tmp directories.
>>
>> - Sam Ruby
>>
>
>
> --
>
> - Shane
>   https://www.apache.org/foundation/marks/resources

Maintenance & docs (was: Ensure svn is up to date when generating public_podlings.json.)

Posted by Shane Curcuru <as...@shanecurcuru.org>.
Sam Ruby wrote on 6/12/17 9:24 PM:
> On Mon, Jun 12, 2017 at 9:06 PM, Sam Ruby <ru...@intertwingly.net> wrote:
...snip...
>> I learned all this the hard way on the original whimsy_vm where
>> directories often got 'wedged' and needed manual intervention for
>> cleanup.  That's why I instituted a hard separation between what can
>> be updated in each process.
> 
> Adding to my answer: this decision (which can be changed if that what
> we collectively want to do) was to prefer slightly stale data over
> data that (at best) might occasionally stop updating, and (at worst)
> can become corrupt.
> 
> The /srv/svn files update every 10 minutes.  For most purposes, that
> is fast enough.

General comments:

- This is a per-tool decision.

- We need to ensure each tool has clear maintenance documentation, so
fixing out of date or bogus data is easy.

- We need to start thinking about how to consistently document these
kinds of things to our users, since the userbase is increasing for many
tools (and overall number of tools total!)

-- Many tools are read-only visualizations of data.  Useful information
is: what does this data mean, how recently was it updated, and where
specifically did it come from (see also test/dataflow.cgi)

-- Read/Write tools should consider explaining how quickly changes take
effect, as well as any auth questions: i.e. are there any tools that
have separate auth from being able to load/see the tool vs. being able
to change any editable data?

-- Some tools should probably explicitly note in the About This Script
that they are access-protected.  This will help remind members to not
share links for some of the member-private data pages, for example.
It's not obvious in many cases to users which specific bits of data
might be member-private vs. committer-private, I think.

Any other "explain to the user what this page is" aspects to cover?


> 
> Programs like the board agenda tool, the secretary mail tool, and now
> the roster take great care to update svn in separate tmp directories.
> 
> - Sam Ruby
> 


-- 

- Shane
  https://www.apache.org/foundation/marks/resources

Re: [whimsy] 02/02: Ensure svn is up to date when generating public_podlings.json.

Posted by sebb <se...@gmail.com>.
On 13 June 2017 at 05:07, Sam Ruby <ru...@intertwingly.net> wrote:
> On Mon, Jun 12, 2017 at 9:54 PM, Sam Ruby <ru...@intertwingly.net> wrote:
>> On Mon, Jun 12, 2017 at 9:44 PM, John D. Ament <jo...@apache.org> wrote:
>>> On Mon, Jun 12, 2017 at 9:24 PM Sam Ruby <ru...@intertwingly.net> wrote:
>>>
>>>> On Mon, Jun 12, 2017 at 9:06 PM, Sam Ruby <ru...@intertwingly.net> wrote:
>>>> > On Mon, Jun 12, 2017 at 7:59 PM, John D. Ament <jo...@apache.org>
>>>> wrote:
>>>> >> On Mon, Jun 12, 2017 at 7:55 PM Sam Ruby <ru...@intertwingly.net>
>>>> wrote:
>>>> >>
>>>> >>> On Mon, Jun 12, 2017 at 7:44 PM,  <jo...@apache.org> wrote:
>>>> >>> > ---
>>>> >>> >  lib/whimsy/asf/svn.rb         | 11 +++++++++++
>>>> >>> >  www/roster/public_podlings.rb |  7 ++++++-
>>>> >>> >  2 files changed, 17 insertions(+), 1 deletion(-)
>>>> >>> >
>>>> >>> > diff --git a/lib/whimsy/asf/svn.rb b/lib/whimsy/asf/svn.rb
>>>> >>> > index 134609c..64a596e 100644
>>>> >>> > --- a/lib/whimsy/asf/svn.rb
>>>> >>> > +++ b/lib/whimsy/asf/svn.rb
>>>> >>> > @@ -141,6 +141,17 @@ module ASF
>>>> >>> >        return revision, content
>>>> >>> >      end
>>>> >>> >
>>>> >>> > +    def self.updateSimple(path)
>>>> >>> > +      cmd = ['svn', 'update', path, '--non-interactive']
>>>> >>>
>>>> >>> This will undoubtedly fail as the $apache::user (www-data) does not
>>>> >>> have write access to those directories.
>>>> >>
>>>> >> Err so should we run cron as whimsysvn ?
>>>> >
>>>> > That's indeed possible, but then it probably can't write to the web
>>>> directory.
>>>> >
>>>> > Also from reading, bad things can happen if two processes are updating
>>>> > the same directory at the same time.  This can be fixed via file
>>>> > locking.  My gitpubsub logic solves this by running the puppet agent
>>>> > itself, and puppet ensures that there is only one agent running at one
>>>> > time.
>>>> >
>>>> > I learned all this the hard way on the original whimsy_vm where
>>>> > directories often got 'wedged' and needed manual intervention for
>>>> > cleanup.  That's why I instituted a hard separation between what can
>>>> > be updated in each process.
>>>>
>>>> Adding to my answer: this decision (which can be changed if that what
>>>> we collectively want to do) was to prefer slightly stale data over
>>>> data that (at best) might occasionally stop updating, and (at worst)
>>>> can become corrupt.
>>>>
>>>> The /srv/svn files update every 10 minutes.  For most purposes, that
>>>> is fast enough.
>>>>
>>>> Programs like the board agenda tool, the secretary mail tool, and now
>>>> the roster take great care to update svn in separate tmp directories.
>>>>
>>> This is a very valuable piece of information.  My main concern isn't roster
>>> but instead the podlings information.
>>>
>>> Shane and I were jokingly talking about this on hipchat - we should switch
>>> all of this to be pubsub.  I'm more convinced that this is correct.
>>
>> You would still need to use flock(*) or equivalent, but definitely doable.
>>
>> The code for pubsub is basically the same for svn as it is for git.
>> The only real difference is that the notification is 'commit' instead
>> of 'push'.
>>
>> https://github.com/apache/whimsy/blob/master/tools/pubsub.rb
>>
>> The other thing to be aware of is that pubsub is only available for
>> publicly readable sources.  So things like foundation and documents
>> can't be done this way.

I wondered about that.

>>
>>> Where's the logic that clones/svn's in a tmp directory?
>>
>> Plenty of places.  Here is one:
>>
>> https://github.com/apache/whimsy/blob/master/www/roster/views/actions/ppmc.json.rb#L71
>>
>> "git grep tmpdir" to find more.
>
> Another thought that should at least work for the podlings.xml case:
>
> podlings_xml =  `svn cat
> https://svn.apache.org/repos/asf/incubator/public/trunk/content/podlings.xml`
>
> No flock.  No temp dirs.  No chance of wedging/corrupting existing directories.

Or just read it as a URL ...

However, SVN does sometimes hiccup, so it might be worth including a retry.
As has been done for rake svn:update in the top level dir.

>>>> - Sam Ruby
>>
>> (*) https://ruby-doc.org/core-2.4.0/File.html#method-i-flock
>
> - Sam Ruby

Re: [whimsy] 02/02: Ensure svn is up to date when generating public_podlings.json.

Posted by Sam Ruby <ru...@intertwingly.net>.
On Mon, Jun 12, 2017 at 9:54 PM, Sam Ruby <ru...@intertwingly.net> wrote:
> On Mon, Jun 12, 2017 at 9:44 PM, John D. Ament <jo...@apache.org> wrote:
>> On Mon, Jun 12, 2017 at 9:24 PM Sam Ruby <ru...@intertwingly.net> wrote:
>>
>>> On Mon, Jun 12, 2017 at 9:06 PM, Sam Ruby <ru...@intertwingly.net> wrote:
>>> > On Mon, Jun 12, 2017 at 7:59 PM, John D. Ament <jo...@apache.org>
>>> wrote:
>>> >> On Mon, Jun 12, 2017 at 7:55 PM Sam Ruby <ru...@intertwingly.net>
>>> wrote:
>>> >>
>>> >>> On Mon, Jun 12, 2017 at 7:44 PM,  <jo...@apache.org> wrote:
>>> >>> > ---
>>> >>> >  lib/whimsy/asf/svn.rb         | 11 +++++++++++
>>> >>> >  www/roster/public_podlings.rb |  7 ++++++-
>>> >>> >  2 files changed, 17 insertions(+), 1 deletion(-)
>>> >>> >
>>> >>> > diff --git a/lib/whimsy/asf/svn.rb b/lib/whimsy/asf/svn.rb
>>> >>> > index 134609c..64a596e 100644
>>> >>> > --- a/lib/whimsy/asf/svn.rb
>>> >>> > +++ b/lib/whimsy/asf/svn.rb
>>> >>> > @@ -141,6 +141,17 @@ module ASF
>>> >>> >        return revision, content
>>> >>> >      end
>>> >>> >
>>> >>> > +    def self.updateSimple(path)
>>> >>> > +      cmd = ['svn', 'update', path, '--non-interactive']
>>> >>>
>>> >>> This will undoubtedly fail as the $apache::user (www-data) does not
>>> >>> have write access to those directories.
>>> >>
>>> >> Err so should we run cron as whimsysvn ?
>>> >
>>> > That's indeed possible, but then it probably can't write to the web
>>> directory.
>>> >
>>> > Also from reading, bad things can happen if two processes are updating
>>> > the same directory at the same time.  This can be fixed via file
>>> > locking.  My gitpubsub logic solves this by running the puppet agent
>>> > itself, and puppet ensures that there is only one agent running at one
>>> > time.
>>> >
>>> > I learned all this the hard way on the original whimsy_vm where
>>> > directories often got 'wedged' and needed manual intervention for
>>> > cleanup.  That's why I instituted a hard separation between what can
>>> > be updated in each process.
>>>
>>> Adding to my answer: this decision (which can be changed if that what
>>> we collectively want to do) was to prefer slightly stale data over
>>> data that (at best) might occasionally stop updating, and (at worst)
>>> can become corrupt.
>>>
>>> The /srv/svn files update every 10 minutes.  For most purposes, that
>>> is fast enough.
>>>
>>> Programs like the board agenda tool, the secretary mail tool, and now
>>> the roster take great care to update svn in separate tmp directories.
>>>
>> This is a very valuable piece of information.  My main concern isn't roster
>> but instead the podlings information.
>>
>> Shane and I were jokingly talking about this on hipchat - we should switch
>> all of this to be pubsub.  I'm more convinced that this is correct.
>
> You would still need to use flock(*) or equivalent, but definitely doable.
>
> The code for pubsub is basically the same for svn as it is for git.
> The only real difference is that the notification is 'commit' instead
> of 'push'.
>
> https://github.com/apache/whimsy/blob/master/tools/pubsub.rb
>
> The other thing to be aware of is that pubsub is only available for
> publicly readable sources.  So things like foundation and documents
> can't be done this way.
>
>> Where's the logic that clones/svn's in a tmp directory?
>
> Plenty of places.  Here is one:
>
> https://github.com/apache/whimsy/blob/master/www/roster/views/actions/ppmc.json.rb#L71
>
> "git grep tmpdir" to find more.

Another thought that should at least work for the podlings.xml case:

podlings_xml =  `svn cat
https://svn.apache.org/repos/asf/incubator/public/trunk/content/podlings.xml`

No flock.  No temp dirs.  No chance of wedging/corrupting existing directories.

>>> - Sam Ruby
>
> (*) https://ruby-doc.org/core-2.4.0/File.html#method-i-flock

- Sam Ruby

Re: [whimsy] 02/02: Ensure svn is up to date when generating public_podlings.json.

Posted by Sam Ruby <ru...@intertwingly.net>.
On Mon, Jun 12, 2017 at 9:44 PM, John D. Ament <jo...@apache.org> wrote:
> On Mon, Jun 12, 2017 at 9:24 PM Sam Ruby <ru...@intertwingly.net> wrote:
>
>> On Mon, Jun 12, 2017 at 9:06 PM, Sam Ruby <ru...@intertwingly.net> wrote:
>> > On Mon, Jun 12, 2017 at 7:59 PM, John D. Ament <jo...@apache.org>
>> wrote:
>> >> On Mon, Jun 12, 2017 at 7:55 PM Sam Ruby <ru...@intertwingly.net>
>> wrote:
>> >>
>> >>> On Mon, Jun 12, 2017 at 7:44 PM,  <jo...@apache.org> wrote:
>> >>> > ---
>> >>> >  lib/whimsy/asf/svn.rb         | 11 +++++++++++
>> >>> >  www/roster/public_podlings.rb |  7 ++++++-
>> >>> >  2 files changed, 17 insertions(+), 1 deletion(-)
>> >>> >
>> >>> > diff --git a/lib/whimsy/asf/svn.rb b/lib/whimsy/asf/svn.rb
>> >>> > index 134609c..64a596e 100644
>> >>> > --- a/lib/whimsy/asf/svn.rb
>> >>> > +++ b/lib/whimsy/asf/svn.rb
>> >>> > @@ -141,6 +141,17 @@ module ASF
>> >>> >        return revision, content
>> >>> >      end
>> >>> >
>> >>> > +    def self.updateSimple(path)
>> >>> > +      cmd = ['svn', 'update', path, '--non-interactive']
>> >>>
>> >>> This will undoubtedly fail as the $apache::user (www-data) does not
>> >>> have write access to those directories.
>> >>
>> >> Err so should we run cron as whimsysvn ?
>> >
>> > That's indeed possible, but then it probably can't write to the web
>> directory.
>> >
>> > Also from reading, bad things can happen if two processes are updating
>> > the same directory at the same time.  This can be fixed via file
>> > locking.  My gitpubsub logic solves this by running the puppet agent
>> > itself, and puppet ensures that there is only one agent running at one
>> > time.
>> >
>> > I learned all this the hard way on the original whimsy_vm where
>> > directories often got 'wedged' and needed manual intervention for
>> > cleanup.  That's why I instituted a hard separation between what can
>> > be updated in each process.
>>
>> Adding to my answer: this decision (which can be changed if that what
>> we collectively want to do) was to prefer slightly stale data over
>> data that (at best) might occasionally stop updating, and (at worst)
>> can become corrupt.
>>
>> The /srv/svn files update every 10 minutes.  For most purposes, that
>> is fast enough.
>>
>> Programs like the board agenda tool, the secretary mail tool, and now
>> the roster take great care to update svn in separate tmp directories.
>>
> This is a very valuable piece of information.  My main concern isn't roster
> but instead the podlings information.
>
> Shane and I were jokingly talking about this on hipchat - we should switch
> all of this to be pubsub.  I'm more convinced that this is correct.

You would still need to use flock(*) or equivalent, but definitely doable.

The code for pubsub is basically the same for svn as it is for git.
The only real difference is that the notification is 'commit' instead
of 'push'.

https://github.com/apache/whimsy/blob/master/tools/pubsub.rb

The other thing to be aware of is that pubsub is only available for
publicly readable sources.  So things like foundation and documents
can't be done this way.

> Where's the logic that clones/svn's in a tmp directory?

Plenty of places.  Here is one:

https://github.com/apache/whimsy/blob/master/www/roster/views/actions/ppmc.json.rb#L71

"git grep tmpdir" to find more.

>> - Sam Ruby

(*) https://ruby-doc.org/core-2.4.0/File.html#method-i-flock

Re: [whimsy] 02/02: Ensure svn is up to date when generating public_podlings.json.

Posted by "John D. Ament" <jo...@apache.org>.
On Mon, Jun 12, 2017 at 9:24 PM Sam Ruby <ru...@intertwingly.net> wrote:

> On Mon, Jun 12, 2017 at 9:06 PM, Sam Ruby <ru...@intertwingly.net> wrote:
> > On Mon, Jun 12, 2017 at 7:59 PM, John D. Ament <jo...@apache.org>
> wrote:
> >> On Mon, Jun 12, 2017 at 7:55 PM Sam Ruby <ru...@intertwingly.net>
> wrote:
> >>
> >>> On Mon, Jun 12, 2017 at 7:44 PM,  <jo...@apache.org> wrote:
> >>> > ---
> >>> >  lib/whimsy/asf/svn.rb         | 11 +++++++++++
> >>> >  www/roster/public_podlings.rb |  7 ++++++-
> >>> >  2 files changed, 17 insertions(+), 1 deletion(-)
> >>> >
> >>> > diff --git a/lib/whimsy/asf/svn.rb b/lib/whimsy/asf/svn.rb
> >>> > index 134609c..64a596e 100644
> >>> > --- a/lib/whimsy/asf/svn.rb
> >>> > +++ b/lib/whimsy/asf/svn.rb
> >>> > @@ -141,6 +141,17 @@ module ASF
> >>> >        return revision, content
> >>> >      end
> >>> >
> >>> > +    def self.updateSimple(path)
> >>> > +      cmd = ['svn', 'update', path, '--non-interactive']
> >>>
> >>> This will undoubtedly fail as the $apache::user (www-data) does not
> >>> have write access to those directories.
> >>
> >> Err so should we run cron as whimsysvn ?
> >
> > That's indeed possible, but then it probably can't write to the web
> directory.
> >
> > Also from reading, bad things can happen if two processes are updating
> > the same directory at the same time.  This can be fixed via file
> > locking.  My gitpubsub logic solves this by running the puppet agent
> > itself, and puppet ensures that there is only one agent running at one
> > time.
> >
> > I learned all this the hard way on the original whimsy_vm where
> > directories often got 'wedged' and needed manual intervention for
> > cleanup.  That's why I instituted a hard separation between what can
> > be updated in each process.
>
> Adding to my answer: this decision (which can be changed if that what
> we collectively want to do) was to prefer slightly stale data over
> data that (at best) might occasionally stop updating, and (at worst)
> can become corrupt.
>
> The /srv/svn files update every 10 minutes.  For most purposes, that
> is fast enough.
>
> Programs like the board agenda tool, the secretary mail tool, and now
> the roster take great care to update svn in separate tmp directories.
>
>
This is a very valuable piece of information.  My main concern isn't roster
but instead the podlings information.

Shane and I were jokingly talking about this on hipchat - we should switch
all of this to be pubsub.  I'm more convinced that this is correct.

Where's the logic that clones/svn's in a tmp directory?


> - Sam Ruby
>

Re: [whimsy] 02/02: Ensure svn is up to date when generating public_podlings.json.

Posted by Sam Ruby <ru...@intertwingly.net>.
On Mon, Jun 12, 2017 at 9:06 PM, Sam Ruby <ru...@intertwingly.net> wrote:
> On Mon, Jun 12, 2017 at 7:59 PM, John D. Ament <jo...@apache.org> wrote:
>> On Mon, Jun 12, 2017 at 7:55 PM Sam Ruby <ru...@intertwingly.net> wrote:
>>
>>> On Mon, Jun 12, 2017 at 7:44 PM,  <jo...@apache.org> wrote:
>>> > ---
>>> >  lib/whimsy/asf/svn.rb         | 11 +++++++++++
>>> >  www/roster/public_podlings.rb |  7 ++++++-
>>> >  2 files changed, 17 insertions(+), 1 deletion(-)
>>> >
>>> > diff --git a/lib/whimsy/asf/svn.rb b/lib/whimsy/asf/svn.rb
>>> > index 134609c..64a596e 100644
>>> > --- a/lib/whimsy/asf/svn.rb
>>> > +++ b/lib/whimsy/asf/svn.rb
>>> > @@ -141,6 +141,17 @@ module ASF
>>> >        return revision, content
>>> >      end
>>> >
>>> > +    def self.updateSimple(path)
>>> > +      cmd = ['svn', 'update', path, '--non-interactive']
>>>
>>> This will undoubtedly fail as the $apache::user (www-data) does not
>>> have write access to those directories.
>>
>> Err so should we run cron as whimsysvn ?
>
> That's indeed possible, but then it probably can't write to the web directory.
>
> Also from reading, bad things can happen if two processes are updating
> the same directory at the same time.  This can be fixed via file
> locking.  My gitpubsub logic solves this by running the puppet agent
> itself, and puppet ensures that there is only one agent running at one
> time.
>
> I learned all this the hard way on the original whimsy_vm where
> directories often got 'wedged' and needed manual intervention for
> cleanup.  That's why I instituted a hard separation between what can
> be updated in each process.

Adding to my answer: this decision (which can be changed if that what
we collectively want to do) was to prefer slightly stale data over
data that (at best) might occasionally stop updating, and (at worst)
can become corrupt.

The /srv/svn files update every 10 minutes.  For most purposes, that
is fast enough.

Programs like the board agenda tool, the secretary mail tool, and now
the roster take great care to update svn in separate tmp directories.

- Sam Ruby

Re: [whimsy] 02/02: Ensure svn is up to date when generating public_podlings.json.

Posted by Sam Ruby <ru...@intertwingly.net>.
On Mon, Jun 12, 2017 at 7:59 PM, John D. Ament <jo...@apache.org> wrote:
> On Mon, Jun 12, 2017 at 7:55 PM Sam Ruby <ru...@intertwingly.net> wrote:
>
>> On Mon, Jun 12, 2017 at 7:44 PM,  <jo...@apache.org> wrote:
>> > ---
>> >  lib/whimsy/asf/svn.rb         | 11 +++++++++++
>> >  www/roster/public_podlings.rb |  7 ++++++-
>> >  2 files changed, 17 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/lib/whimsy/asf/svn.rb b/lib/whimsy/asf/svn.rb
>> > index 134609c..64a596e 100644
>> > --- a/lib/whimsy/asf/svn.rb
>> > +++ b/lib/whimsy/asf/svn.rb
>> > @@ -141,6 +141,17 @@ module ASF
>> >        return revision, content
>> >      end
>> >
>> > +    def self.updateSimple(path)
>> > +      cmd = ['svn', 'update', path, '--non-interactive']
>>
>> This will undoubtedly fail as the $apache::user (www-data) does not
>> have write access to those directories.
>>
>
> Err so should we run cron as whimsysvn ?

That's indeed possible, but then it probably can't write to the web directory.

Also from reading, bad things can happen if two processes are updating
the same directory at the same time.  This can be fixed via file
locking.  My gitpubsub logic solves this by running the puppet agent
itself, and puppet ensures that there is only one agent running at one
time.

I learned all this the hard way on the original whimsy_vm where
directories often got 'wedged' and needed manual intervention for
cleanup.  That's why I instituted a hard separation between what can
be updated in each process.

- Sam Ruby

Re: [whimsy] 02/02: Ensure svn is up to date when generating public_podlings.json.

Posted by "John D. Ament" <jo...@apache.org>.
On Mon, Jun 12, 2017 at 7:55 PM Sam Ruby <ru...@intertwingly.net> wrote:

> On Mon, Jun 12, 2017 at 7:44 PM,  <jo...@apache.org> wrote:
> > ---
> >  lib/whimsy/asf/svn.rb         | 11 +++++++++++
> >  www/roster/public_podlings.rb |  7 ++++++-
> >  2 files changed, 17 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/whimsy/asf/svn.rb b/lib/whimsy/asf/svn.rb
> > index 134609c..64a596e 100644
> > --- a/lib/whimsy/asf/svn.rb
> > +++ b/lib/whimsy/asf/svn.rb
> > @@ -141,6 +141,17 @@ module ASF
> >        return revision, content
> >      end
> >
> > +    def self.updateSimple(path)
> > +      cmd = ['svn', 'update', path, '--non-interactive']
>
> This will undoubtedly fail as the $apache::user (www-data) does not
> have write access to those directories.
>

Err so should we run cron as whimsysvn ?


>
>
> https://github.com/apache/infrastructure-puppet/blob/deployment/modules/whimsy_server/manifests/cronjobs.pp#L89
>
>
> https://github.com/apache/infrastructure-puppet/blob/deployment/modules/whimsy_server/manifests/init.pp#L116
>
> - Sam Ruby
>