You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Ben Collins-Sussman <su...@collab.net> on 2005/08/01 22:30:36 UTC

Re: logging: are we there yet?

On Jul 27, 2005, at 6:10 PM, Nicolás Lichtmaier wrote:

>
>
>>   2. mod_dav_svn learns to write "high level" logs, in addition to  
>> the
>>      existing low-level ones.
>>
>>        --> a number of apache developers have told me about  
>> legitimate
>>            methods to write 'custom' svn messages into apache's
>>            accesslog.
>>
>>
>>   3. nothing else writes logs.
>>
>>        --> Logs aren't interesting or important (or trustworthy, or
>>            even necessarily complete) unless we're talking about
>>            server programs talking to clients over a network.
>>
>
>
> I don't like it. It means logging is specific to the way the  
> repository is served.

Exactly.  Logging is a property of the *server*, not the repository.


> Wouldn't it be better to make logging independent of that?

We already some variants of this idea, and ended up not liking them.   
Greg Hudson (and others) pointed out that the only trustworthy,  
interesting logging comes from servers.  That's why we're advocating  
a server-centric design here, not a repository-centric design.

For example:  apache might serve a dozen different websites using  
VirtualHost.  We don't see separate logfiles for each site, we see  
logfiles for *apache*.  The same idea would go for svnserve.  It has  
a single logfile, even though it may be serving a dozen  
repositories.  The history of a single repository can be found by  
parsing the log.

With a single logfile, you can view *all* accesses in one place, and  
also extract individual repository access histories.  If each  
repository had its own logfile, then doing a security audit would be  
a real pain;  you'd have to hunt down each logfile and scan it  
separately.


> What about repositories which are used with several access methods?
>

Then they get both apache and svnserve logs of activity.  Bonus.

(Besides, this is a small edge case.  How many projects can you name  
that do this?)


> Although I'm not a developer I'd like to propose a different approach:
[...]
> Local file access is logged two! The argument about it having no  
> meaning... has no meaning! The "security" of the log system would  
> be no better than the security of the repository itself. Somebody  
> who uses file:// access doesn't care about security, and he still  
> may want to see how his repository is used.

As Greg Hudson already said, this is why process accounting tools exist.

>
> All of this means logging to simple files. System logging is  
> usually useful for system events. Who would like to see each "svn  
> up" as an informational event in Windows' event viewer?

Many people.  The whole appeal of Greg's proposal is that OS logging  
facilities exist for more than just the OS -- they're supposed to be  
used by "well behaved" applications.  They're accessible and  
parseable by multiple tools.  We're trying to make svnserve into a  
"well behaved" program, rather than have it reinvent the wheel.


> [...] Simple file-logging has several advantages: Logs are the same  
> cross-platform, and Subversion is heavily cross-platform.

Subversion is still "cross platform" and "portable".  As mentioned  
earlier, there's nothing "unportable" about using an OS-specific  
service.  That's what APR does already in many cases.  What's  
portable here is that logging is happening on every platform, not the  
means used to achieve the logging.


> Tools can be written to accesss them, process them and give  
> statistics.

You can argue that it would be convenient to write a single tool  
which parses svnserve logs on every OS.  But then we can counter- 
argue that tools to parse win32 and syslogd events *already* exist  
and are standard.

The other disadvantage of implementing our own private logfiles is  
the issue of workload:  as discussed in earlier threads, this is not  
a trivial task.  It's a big wheel to reinvent.  We have to worry  
about formatting flexibility, atomic log writes, and log rotation  
too.  Or, we can use existing systems that already do this.

It seems like you've countered with the older proposal we discussed  
last week, and all I've done is rehash the same criticisms that made  
us dislike it.  Are you persuaded?



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: logging: are we there yet?

Posted by Charles Bailey <ba...@gmail.com>.
On 8/5/05, Greg Hudson <gh...@mit.edu> wrote:
> On Wed, 2005-08-03 at 10:57 -0400, Charles Bailey wrote:
> > I'm concerned that this may be too narrow a view of logging (i.e. as a
> > tool for <nontechnical>auditing</nontechnical>).  While it's certainly
> > true that a log writeable by a local svn client is hackable by a
> > local, I suspect most local users aren't malicious.  Given that, I can
> > see reasonable, non-sensitive uses for logging of local activity.
> 
> However, applications do not normally log local activity.  "ls" does not
> log what directories you look at, etc..  Such information might be
> useful, but normally logging of local activity is expected to be done at
> another layer, such as shell histories or process accounting.

I think we may be looking at the situation from different sides.  I
don't disagree that simple applications like 'ls', 'chmod', 'mv' and
the like should be left to process accounting.  On the other hand,
applications like 'dump', 'cron', 'sudo',  'pam', and (perhaps
significantly for this list) 'cvs' do log, or at least provide the
option for it.  In some cases, the concern is rudimentary auditing, in
others it's saving incremental state, and in yet others it's perhaps
tracking resource usage.  My sense is that Subversion falls more into
the latter category than the former.


> I'm not sure if I can provide a compelling justification for this
> attitude in the world of software, but I'm not sure the burden of proof
> is on me.  I don't think Subversion should invent jobs for itself beyond
> what the world normally expects of applications in its space.  Network
> servers are expected to be able to log client requests; random tools are
> not expected to be able to log everything they do.

Again, I don't disagree with your overall conclusion, but I think that
Subversion isn't a "random tool", in this sense. :)  I really don't
think it's a case of inventing a job for Subversion, but one of
deciding whether Subversion will agree to fulfil this (IMHO, but not
necessarily true) reasonable request.  Put a different way, 'svn' is
inherently a client; it just happens in some cases (file://) to handle
its own requests.

That said, my argument is more based in a broad view of logging from
Subversion's perspective than in a broad view of the requirements for
the average program.  To be a bit more clear (I hope):   My sense is
that sufficiently many users would like Subversion to do some sort of
'logical' (i.e. in terms of Subversion operations on Subversion paths)
logging that the developers have agreed it should be implemented.
There are several uses for this.  One of these uses is
<non-technical>auditing</non-technical> usage of repositories; for
this use, completeness of the log is a consideration, and it's tough
to guarantee that in the case of locally writeable repositories. 
Therefore, <non-technical>reliable</non-technical> logs may well
require access via a server.  However, there're still other use cases
(some of which I've  noted before) that don't have this requirement. 
If Subversion's logging can be implemented so as to satisfy both
groups of users, what's the harm?  Perhaps it's  complexity -- if it
becomes a real limiting factor, that in itself may be enough to
justify scrapping "general" logging.  Perhaps it's concern for
confusion about what is and isn't "reliable" -- I'd argue that this is
better addressed by documentation than by omitting the feature. 
Perhaps it's a general concern that the more Subversion does, the more
users will expect -- I'd argue that you haven't crossed the boundary
of what's reasonable yet; burn that bridge when it's behind you. 
Perhaps it's the feeling that Subversion should in general draw its
feature narrowly to minimize the chance of future maintenance
headaches  -- this may be a philosophical divergence; I'm essentially
arguing for the most inclusive approach that's practical.

I'm not sure this meets the burden of proof for which you're looking,
but I do hope it helps to clarify why I think logging of file://
activity in Subversion is a reasonable option.

-- 
Regards,
Charles Bailey
Lists: bailey _dot_ charles _at_ gmail _dot_ com
Other: bailey _at_ newman _dot_ upenn _dot_ edu

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: logging: are we there yet?

Posted by Greg Hudson <gh...@MIT.EDU>.
On Wed, 2005-08-03 at 10:57 -0400, Charles Bailey wrote:
> I'm concerned that this may be too narrow a view of logging (i.e. as a
> tool for <nontechnical>auditing</nontechnical>).  While it's certainly
> true that a log writeable by a local svn client is hackable by a
> local, I suspect most local users aren't malicious.  Given that, I can
> see reasonable, non-sensitive uses for logging of local activity.

However, applications do not normally log local activity.  "ls" does not
log what directories you look at, etc..  Such information might be
useful, but normally logging of local activity is expected to be done at
another layer, such as shell histories or process accounting.

I'm not sure if I can provide a compelling justification for this
attitude in the world of software, but I'm not sure the burden of proof
is on me.  I don't think Subversion should invent jobs for itself beyond
what the world normally expects of applications in its space.  Network
servers are expected to be able to log client requests; random tools are
not expected to be able to log everything they do.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: logging: are we there yet?

Posted by Charles Bailey <ba...@gmail.com>.
On 8/1/05, Ben Collins-Sussman <su...@collab.net> wrote:
> 
> On Jul 27, 2005, at 6:10 PM, Nicolás Lichtmaier wrote:
> 
> > Although I'm not a developer I'd like to propose a different approach:
> [...]
> > Local file access is logged two! The argument about it having no
> > meaning... has no meaning! The "security" of the log system would
> > be no better than the security of the repository itself. Somebody
> > who uses file:// access doesn't care about security, and he still
> > may want to see how his repository is used.
> 
> As Greg Hudson already said, this is why process accounting tools exist.

<disclaimer>
I've been off list for a couple weeks, and am catching up in bulk, so
please take this as an apology-before-the-fact if I'm rehashing a
point that's been argued in detail in the list already.
</disclaimer>

I'm concerned that this may be too narrow a view of logging (i.e. as a
tool for <nontechnical>auditing</nontechnical>).  While it's certainly
true that a log writeable by a local svn client is hackable by a
local, I suspect most local users aren't malicious.  Given that, I can
see reasonable, non-sensitive uses for logging of local activity.  For
example, it might be useful to know which parts of a repo see the most
activity in order to adjust backup schedules, or the least activity in
order to decide what to archive.  Watching patterns of access might
also help guide decisions about splitting or merging repos, updating
hooks, and the like.   None of this is particularly sensitive; the
worst you'd get from a hacked log is a suboptimal repo structure.

Depending on the process accounting tools available, this kind of
information might be tough to recover from the outside.  (For
instance, noting that the svn executable was invoked, or even that
/path/to/repo was hit, doesn't help all that much if one can't recover
the URL to see what was hit inside the repo.)  It also seems a bit the
proverbial elephant gun for the fly: recroding everything select
(all?) users do in order to better manage svn repos may become
unwieldy.

I understand that, under the curent proposal, the interested sysadmin
can simply switch over from file:// access to svn:// access, so this
isn't a showstopper.  Therefore, if logging direct access is a major
pain to code, that may be reason enough to skip it.  But it does feel
(to me) somewhat incomplete.  I think there's advantage, in mindshare
if not in technical capability, to giving the feature as broad a scope
as possible, and letting people restrict it as needed, rather than to
giving the feature as safe a scope as possible, and lettng people work
around its limits for less secure use cases.

Ya $0.02.

-- 
Regards,
Charles Bailey
Lists: bailey _dot_ charles _at_ gmail _dot_ com
Other: bailey _at_ newman _dot_ upenn _dot_ edu

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: logging: are we there yet?

Posted by Michael Sweet <mi...@easysw.com>.
Ben Collins-Sussman wrote:
> 
> On Aug 1, 2005, at 7:03 PM, Michael Sweet wrote:
> 
>>
>>     1. Different people manage different repos, and you might not
>>        want everyone to have access to every project's log data.
> 
> 
> ? There's only one server administrator, isn't there?  Since when are  
> users able to examine the repository database directly?

I'm looking at the SVN repo log as belonging to the administrator
of the SVN repo, which may not be the same person that administers
the physical server (think Collab.net... :)

>>     2. Some repos are more active than others, and so you might want
>>        to rotate/process/audit a busy project's log file more often
>>        than a slow project.
>>
>>     3. By putting everything in one file, you force the data to be
>>        separated N times instead of 1 time.
>>
> 
> I'm not sure I understand why these are good or bad things.

I'm not saying good-or-bad here, I am pointing out why I want
support for separate log files.

>>     4. By putting everything in one file, you force the server to
>>        know some notion of where the repository is located, i.e.
>>        if someone checks out http://server/project/foo/trunk,
>>        mod_svn needs to log the full repository path instead of
>>        the relative path within the repo.
> 
> 
> The server always knows where each repository is located;  it  couldn't 
> open the repository otherwise.  :-)  You're right, with a  shared 
> logfile, the names of the repositories would have to be  logged, so that 
> we can tell the logs apart, that's a given.

Right, that increases overhead (disk space, extra processing overhead
when doing reports, etc.), but it might also add confusion if you are
logging the physical disk location of the repo vs. the URL that was
used to access it.

>> So, if I may so humbly request that at least mod_svn support
>> logging to separate files - Apache already supports that...
> 
> 
> 
> We're not planning on making any changes to the way apache/ 
> mod_dav_svn's logging mechanism.   We're only going to make it log  
> *more* things.

OK.

-- 
______________________________________________________________________
Michael Sweet, Easy Software Products           mike at easysw dot com
Internet Printing and Document Software          http://www.easysw.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: logging: are we there yet?

Posted by Ben Collins-Sussman <su...@collab.net>.
On Aug 1, 2005, at 7:03 PM, Michael Sweet wrote:

>
>     1. Different people manage different repos, and you might not
>        want everyone to have access to every project's log data.

? There's only one server administrator, isn't there?  Since when are  
users able to examine the repository database directly?


>
>     2. Some repos are more active than others, and so you might want
>        to rotate/process/audit a busy project's log file more often
>        than a slow project.
>
>     3. By putting everything in one file, you force the data to be
>        separated N times instead of 1 time.
>

I'm not sure I understand why these are good or bad things.


>     4. By putting everything in one file, you force the server to
>        know some notion of where the repository is located, i.e.
>        if someone checks out http://server/project/foo/trunk,
>        mod_svn needs to log the full repository path instead of
>        the relative path within the repo.

The server always knows where each repository is located;  it  
couldn't open the repository otherwise.  :-)  You're right, with a  
shared logfile, the names of the repositories would have to be  
logged, so that we can tell the logs apart, that's a given.

>
> So, if I may so humbly request that at least mod_svn support
> logging to separate files - Apache already supports that...


We're not planning on making any changes to the way apache/ 
mod_dav_svn's logging mechanism.   We're only going to make it log  
*more* things.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: logging: are we there yet?

Posted by Michael Sweet <mi...@easysw.com>.
Ben Collins-Sussman wrote:
> ...
> For example:  apache might serve a dozen different websites using  
> VirtualHost.  We don't see separate logfiles for each site, we see  
> logfiles for *apache*.  The same idea would go for svnserve.  It has  a 
> single logfile, even though it may be serving a dozen  repositories.  
> The history of a single repository can be found by  parsing the log.

Actually, that example doesn't hold up because you can (and AFAIK most
people do) use separate log files for each VirtualHost, Location, etc.

> With a single logfile, you can view *all* accesses in one place, and  
> also extract individual repository access histories.  If each  
> repository had its own logfile, then doing a security audit would be  a 
> real pain;  you'd have to hunt down each logfile and scan it  separately.

Actually, separate log files would be more useful for me.

Some thoughts:

     1. Different people manage different repos, and you might not
        want everyone to have access to every project's log data.

     2. Some repos are more active than others, and so you might want
        to rotate/process/audit a busy project's log file more often
        than a slow project.

     3. By putting everything in one file, you force the data to be
        separated N times instead of 1 time.

     4. By putting everything in one file, you force the server to
        know some notion of where the repository is located, i.e.
        if someone checks out http://server/project/foo/trunk,
        mod_svn needs to log the full repository path instead of
        the relative path within the repo.

So, if I may so humbly request that at least mod_svn support
logging to separate files - Apache already supports that...

-- 
______________________________________________________________________
Michael Sweet, Easy Software Products           mike at easysw dot com
Internet Printing and Publishing Software        http://www.easysw.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: logging: are we there yet?

Posted by Ben Collins-Sussman <su...@collab.net>.
On Aug 2, 2005, at 9:21 AM, C. Michael Pilato wrote:
>
> Ben, this may not be what you were talking about here, but I still
> think that a mod_dav_svn logging subsystem should understand
> httpd.conf directives for the log file locations and the log level,
> just like mod_rewrite does.  Doing so allows administrators to decide
> if they want all their mod_dav_svn Subversion logs in one place, or if
> they want to distribute them based on URLs.  With Apache's cascading
> configuration support, there's no point in *not* giving this
> flexibility.

I think everyone is assuming that we're planning to add a whole  
"subsystem" to mod_dav_svn;  that was never part of the plan.  The  
plan was to make mod_dav_svn write "client did an update" to the  
standard apache accesslog, nothing more.

But, it sounds like people want us to make mod_dav_svn more complex  
than that... to be able to write its own private logs as well.  Once  
again, I ask:  is it worth reinventing logging functionality?   
Figuring out how to write to logs atomically, creating new httpd  
directives, worrying about rotation.  Sure, I guess we could imitate  
whatever mod_rewrite is doing, but it sort of defeats the whole goal  
of not reinventing logging.



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: logging: are we there yet?

Posted by "C. Michael Pilato" <cm...@collab.net>.
Ben Collins-Sussman <su...@collab.net> writes:

> > Wouldn't it be better to make logging independent of that?
> 
> We already some variants of this idea, and ended up not liking them.
> Greg Hudson (and others) pointed out that the only trustworthy,
> interesting logging comes from servers.  That's why we're advocating
> a server-centric design here, not a repository-centric design.
> 
> For example:  apache might serve a dozen different websites using
> VirtualHost.  We don't see separate logfiles for each site, we see
> logfiles for *apache*.  The same idea would go for svnserve.  It has
> a single logfile, even though it may be serving a dozen  repositories.
> The history of a single repository can be found by  parsing the log.
> 
> With a single logfile, you can view *all* accesses in one place, and
> also extract individual repository access histories.  If each
> repository had its own logfile, then doing a security audit would be
> a real pain;  you'd have to hunt down each logfile and scan it
> separately.

Ben, this may not be what you were talking about here, but I still
think that a mod_dav_svn logging subsystem should understand
httpd.conf directives for the log file locations and the log level,
just like mod_rewrite does.  Doing so allows administrators to decide
if they want all their mod_dav_svn Subversion logs in one place, or if
they want to distribute them based on URLs.  With Apache's cascading
configuration support, there's no point in *not* giving this
flexibility.

Your security audit complaint is bogus.  Log file locations will never
be dynamically generated -- if they aren't listed in some Apache
configuration, they don't get used.  So the set of logs can be
obtained by grepping for 'SVNAccessLog' and 'SVNErrorLog' in
httpd.conf and its included configuration files.

(And also, have you seen our own red-bean.com Apache configuration?
Lots of virtualhosts, each with their own access, ssl-access, and
error files).

> > What about repositories which are used with several access methods?
> >
> 
> Then they get both apache and svnserve logs of activity.  Bonus.
> 
> (Besides, this is a small edge case.  How many projects can you name
> that do this?)

Yeah, -1 on trying to make mod_dav_svn and svnserve share logs.
svnserve, as a Unix-y daemon, can use the Unix-y (syslog) logging
semantics.  mod_dav_svn, as an Apache module, uses Apache logging
semantics.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org