You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Jonathan Gilbert <o2...@sneakemail.com> on 2005/10/30 04:13:30 UTC

[PROPOSAL] Return of the (svnserve) log

Hello,

I've been using SVN for a while now and I realized suddenly, "svnserve is
quite similar to httpd, so why doesn't it log events like an httpd does?"
Actually, I've been subscribed to the dev list for a while, so the real
thought was "Logging facilities for svnserve were discussed in July, so why
doesn't it *already* log events like an httpd does?" I brought up the issue
on the IRC channel and a discussion ensued in which I learned that Sussman
had been planning to implement it, but only finished mod_dav_svn before the
"Big 1.3 Push". Before he could sink his teeth into the svnserve half of
it, he woke up one morning and found himself an employee of Google instead
of CollabNet :-) As a result, Sussman no longer has time to implement the
logging facility for svnserve. I was thinking I might be up to the task :-)

I'm going to present a list of features that I thought would be good for a
base implementation. I know there is a strong urge to K.I.S.S., but I will
present a rationale for each of the features, and in any event, none of
these features are particularly difficult to implement properly.

The basic concept of logging is, as I started out saying, the same as that
for a web server such as Apache. There are a number of classes of logs, of
which at least two are "access" and "error" (to use the Apache
terminology). The "access" log shows successful day-to-day transactions,
such as checkout/export, commit, mkdir, list, what have you. The "error"
log shows the same kind of actions, but indicates their failure. For
instance, if I attempt to list a directory which doesn't exist, the log
entry for that failed request goes into the "error" log, not the "access"
log. Sussman also mentioned that a third class of error should also be
implemented: "authorization", so that those events which directly pertain
to security can be painlessly split off.

What follows is the list of features that I feel are appropriate, with a
brief explanation of why I think each one should be the way it is:

------------------------------------------------------------------------
1. Possible targets: syslogd, Windows Event Log service, and flat files.

Sussman told me on IRC that the previous discussion of the topic had
decided that flat files would probably be redundant, as syslogd can simply
be configured to redirect the SVN log entries to their own file.

It turns out, however, that the Windows Event Log service does *not*
support this. The closest it can come is allocating a (binary) .evt file
for svnserve. These .evt files have a maximum size, and when the size is
reached, they are not automatically rolled. One of three possible
behaviours can be selected:

1) Old events will be overwritten as needed.
2) Events older than a certain date will be deleted en masse to make room.
3) New events are not logged.

Obviously, the Windows Event Log service is designed for an entirely
different class of logging, where events are expected to be few in number
and relatively infrequent. Microsoft's documentation of the service states:

    Event logging consumes resources such as disk space and processor time.
    The amount of disk space that an event log requires and the overhead for
    an application that logs events depend on how much information you choose
    to log. This is why it is important to log only essential information. It
    is also good to place event logging calls in an error path in the code
    rather than in the main code path, which would reduce performance.

David Anderson mentioned to me that syslogd has a limited number of
"facilities", and that these are what are used by syslogd to split events
off to different files. As such, multiple repositories being handled by
svnserve would all have their log entries sent to the same place. I have
also seen mention in the SVN dev list logs that some people do not wish to
run syslogd for whatever reason. In order to accommodate these people as
well as those using Windows and not force our log messages onto the
available syslog facility codes, I believe it is important to support
directly writing to flat files.

------------------------------------------------------------------------
2. Classes of events: auth, access, and error.

Different system administrators will be interested in different things.
Some people are interested in knowing precisely who is communicating from
where and what user they are purporting to be. Other people are interested
to know what parts of their repository are being accessed the most.
Probably a fair number of people are interested in being able to detect
attacks on their server, which could take the form of denial of service.
For these reasons, I believe it is important to divide logs up primarily
into these 3 categories:

- Auth events (Authentication & Authorization), which involve people
identifying themselves to the server and requesting resources, would
indicate what user account, if any, the user had provided, what IP address
they were connecting from, and, most importantly, whether the attempt was
successful. Failed authorization attempts (attempting to write when the
repository is read-only) would also indicate which resource the user had
attempted to access without authority.

- Access events, which involve people successfully working with the server
using day-to-day functions like "checkout", "commit", "list", "mkdir",
would indicate which user & IP address the request had come from, what the
request type was, and which resource the request involved. If there are
some options which the server can discern for certain requests (such as
perhaps a request for recursion), these should also be noted if they are
available.

- Error events, which involve people who have successfully authenticated
with the server asking it to perform an action it cannot or will not do,
would indicate similar information as access events but also indicate the
cause of the failure, perhaps through the use of a status code.

Of course, not everyone will want error events split off from access
events. Functionality for this is discussed in feature #4 below :-)

------------------------------------------------------------------------
3. Common file format for all plain text log data.

While auth, access and error events do not log precisely the same sets of
information, it should be (as discussed in feature #4 below) at least
possible for an administrator to combine all log information into a single
file. When web servers proliferated and came under widespread use, tools
emerged for analyzing the log files produced by servers and providing
statistics and other analyses. While svnserve is less likely to attract
such tools, it doesn't cost us anything to at least use a common format
when logging any class of event.

In order to be human-readable, such a format should be plain ASCII text,
similar in nature to a web server's logs (this is only an issue on Windows,
where the Event Log service allows arbitrary binary data to be logged). In
order to be machine-parseable, the format should have a fixed number of
fields delimited by spaces, and fields whose content could potentially
contain spaces should (always) be enclosed in quotation marks, with some
provision for escaping (I'm thinking of URLs here, primarily, and as auth
events would not necessarily include a URL, the field could be encoded
using two adjacent quotes in the file ""). The exact format of the log
messages will depend on precisely which data is available, which is
something I will determine when I review svnserve's architecture and the
existing logging facility added to mod_dav_svn by Sussman. It will,
however, most certainly include a date & timestamp.

------------------------------------------------------------------------
4. Config file syntax to allow multiple classes of events to be logged to
the same flat file.

As mentioned earlier, some people will inevitably wish to have a
consolidated log format (myself, for instance), a mechanism to allow
multiple logs to be directed at the same file is required. While this could
be done by simply requesting the same filename:

error-log = access_and_error.log
access-log = access_and_error.log

.. the appearance of this in the config file raises questions in the user's
head: Will the server be smart enough to canonicalize the paths & check
that they are the same file, or will it open the same file with two
separate handles and completely mangle the resulting log data? In addition,
the implementation of this kind of checking is troublesome to say the least.

In order to simplify implementation and remove this dubious appearance from
the config file, I propose the following syntax:

logfile-1 = access_and_error.log
logfile-2 = auth.log

error-log = file 1
access-log = file 1
auth-log = file 2

The precise format of the right-hand-side of each "-log" entry would be to
allow one of the following:

"file N", to use the file indicated by the "logfile-N" directive,
"syslog", to use the UNIX syslog facility (an error on unsupporting systems),
"WindowsEventLog", to use the Windows API (an error when not on Windows).

This syntax suggests a level of abstraction between the event sinks and the
output mechanisms, which I believe is the best way to implement the
functionality.

------------------------------------------------------------------------
5. Configurable behaviour for failure to log an event.

Some people are interested in logged information for important security
reasons; they will see it as an audit trail. Other users of SVN, such as
myself, will be interested purely for informational purposes.

When an audit trail is being produced and the target device becomes full or
otherwise unable to accommodate a log entry, everything grinds to a
terrifying halt, because it would be completely unacceptable to permit
events to proceed without logging them when the administrator has
specifically requested an audit trail. However, if the log information is
not being considered a vital source of information about the behaviour
patterns of those with access to the repository, it would be inappropriate
to deny service in the event that actions cannot be logged.

Therefore, I propose a property to be applied independently to classes of
logs which makes that class guarantee an audit. Disabled by default, this
property would make svnserve refuse to handle requests if it failed to log
them.

I propose the following name for the property in the config file:

auth-log-auditing = on
access-log-auditing = off

If auditing is disabled and logging fails, I propose that svnserve first
attempt to directly log the failure (not the event itself) to syslog, and
if that fails, write it to stderr, which may or may not show up on the
system's console. Understand that this is a last resort :-)

These 5 features seem to me fundamental to any properly functional & usable
logging system for svnserve. If I've missed anything important, just let me
know, of course :-) I'm interested to hear everyone's thoughts on what I've
written here and on logging in svnserve.

Jonathan Gilbert


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Return of the (svnserve) log

Posted by Jonathan Gilbert <o2...@sneakemail.com>.

At 01:49 AM 31/10/2005 -0500, Greg Hudson wrote:
[snip]
>I believe that for regular files on a local filesystem, write() does all
>or nothing.  I don't have an authoritative source, since I didn't manage
>to pull up either the SUS or POSIX standards on the web easily.

This is what I want to be sure about. :-)

I've tracked down the SUS standard and located its entry on write(). Here
are the pertinent bits:

  If a write() requests that more bytes be written than there is room for (for
  example, [XSI] the process' file size limit or the physical end of a
  medium), only as many bytes as there is room for shall be written. For
  example, suppose there is space for 20 bytes more in a file before reaching
  a limit. A write of 512 bytes will return 20. The next write of a non-zero
  number of bytes would give a failure return (except as noted below).

And:

  If write() is interrupted by a signal after it successfully writes some
  data, it shall return the number of bytes written.

So, basically, there are ways in which a write() could succeed only
partially. The file size limit is probably not so important, as other
attempts from other threads/processes will encounter the same lack of quota
or free space and receive an error with no data output at all. More
troubling is that a signal which hits the thread doing the write() can, if
I'm reading correctly, split the operation.

The chances of this actually happening for write()s of the size needed for
logging seem pretty slim to me, but they certainly cannot be said to be
zero without investigating the implementation. =/

Anyway, I suppose other important people live with those odds. Apache isn't
known for producing broken log files even though it typically has dozens of
forked processes all potentially vying to log.

>If a filesystem gives up on returning an error from write() on disk
>full, then pretty much every application is sunk when it comes to
>graceful filled disk recovery.  Subversion won't be unusual in that
>regard.

Hehe, true. The buggy filesystems listed in the Linux man page are probably
experimental versions of the Minix filesystem driver used before the ext
filesystem was first put together, or something :-)

[snip]
>At any rate, locking or fsyncing for each log message would be a
>performance killer, so even if there are edge cases for simply opening
>for append and writing, I think we should do it anyway.

Okay. This is the simplest path to implement anyway :-) If people start
coming to us saying "1.4 produces broken log files", then we can start
investigating locking or possibly some other solution. Until such time,
we'll do without them :-)

Jonathan Gilbert

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Return of the (svnserve) log

Posted by Michael Sinz <Mi...@sinz.org>.

Jonathan Gilbert wrote:
> At 02:35 PM 30/10/2005 -0500, Michael Sinz wrote:
> 
>>Jonathan Gilbert wrote:
>>[...]
>>
>>>Another minor issue is that the Linux man page for write() (section 2)
>>>indicates that there exist filesystems where write() doesn't even guarantee
>>>that space on the device has been reserved for the device, let alone that
>>>the data has been written. If the two file handles don't know about each
>>>other, then we also need to fsync() after every write(), and this creates a
>>>race condition in the absence of synchronization.
>>
>>Those two file handles due to symlinks to the same file will actually be
>>handled at the VFS/filesystem layer since they will open the same inode.
>>Even if the path is different and the name is different, filehandles really
>>only care about the inode it is talking to.  So even hardlinks will be
>>handled correctly.  (Softlinks are even more interesting since the VFS layer
>>does all of the softlink translation thus the final actual open operation
>>runs against the final target file/inode)
> 
> 
> This may be the case on a specific implementation, but is it a portable
> guarantee? I'm not sure where to check...

For softlinks that is basically POSIX.  On hard links, they are directory
entries that point to the same inode.  The only platform this is unclear for
that I know of is Windows.  (Ok, maybe OS/400 - it has been many years since
I played with an AS/400 machine)

I windows had the same behavior but if I remember correctly, when you get a
write handle on a specific file, Windows tends to be much more restrictive as
to being able to get another, even from within the same process.  But I have
not tested this and don't normally do software development for Windows.

-- 
Michael Sinz                     Technology and Engineering Director/Consultant
"Starting Startups"                                mailto:michael.sinz@sinz.org
My place on the web                            http://www.sinz.org/Michael.Sinz

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Return of the (svnserve) log

Posted by Jonathan Gilbert <o2...@sneakemail.com>.

At 02:35 PM 30/10/2005 -0500, Michael Sinz wrote:
>Jonathan Gilbert wrote:
>[...]
>> Another minor issue is that the Linux man page for write() (section 2)
>> indicates that there exist filesystems where write() doesn't even guarantee
>> that space on the device has been reserved for the device, let alone that
>> the data has been written. If the two file handles don't know about each
>> other, then we also need to fsync() after every write(), and this creates a
>> race condition in the absence of synchronization.
>
>Those two file handles due to symlinks to the same file will actually be
>handled at the VFS/filesystem layer since they will open the same inode.
>Even if the path is different and the name is different, filehandles really
>only care about the inode it is talking to.  So even hardlinks will be
>handled correctly.  (Softlinks are even more interesting since the VFS layer
>does all of the softlink translation thus the final actual open operation
>runs against the final target file/inode)

This may be the case on a specific implementation, but is it a portable
guarantee? I'm not sure where to check...

Jonathan Gilbert

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Return of the (svnserve) log

Posted by Michael Sinz <Mi...@sinz.org>.

Jonathan Gilbert wrote:
[...]
> Another minor issue is that the Linux man page for write() (section 2)
> indicates that there exist filesystems where write() doesn't even guarantee
> that space on the device has been reserved for the device, let alone that
> the data has been written. If the two file handles don't know about each
> other, then we also need to fsync() after every write(), and this creates a
> race condition in the absence of synchronization.

Those two file handles due to symlinks to the same file will actually be
handled at the VFS/filesystem layer since they will open the same inode.
Even if the path is different and the name is different, filehandles really
only care about the inode it is talking to.  So even hardlinks will be
handled correctly.  (Softlinks are even more interesting since the VFS layer
does all of the softlink translation thus the final actual open operation
runs against the final target file/inode)

-- 
Michael Sinz                     Technology and Engineering Director/Consultant
"Starting Startups"                                mailto:michael.sinz@sinz.org
My place on the web                            http://www.sinz.org/Michael.Sinz

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Return of the (svnserve) log

Posted by Malcolm Rowe <ma...@farside.org.uk>.

On Mon, Oct 31, 2005 at 01:49:25AM -0500, Greg Hudson wrote:
> > I don't see how this can guarantee what we need. While whatever a write()
> > sends out is guaranteed to be atomic, the write() function is capable of
> > doing partial writes, and there's no way to be certain that it won't do
> > that.
> 
> I believe that for regular files on a local filesystem, write() does all
> or nothing.  I don't have an authoritative source, since I didn't manage
> to pull up either the SUS or POSIX standards on the web easily.
> 

http://www.opengroup.org/onlinepubs/009695399/

In general, it's implementation-defined for regular files (FIFOs and
pipes have their own rules, of course).  Some implementations make
certain small writes atomic, for example.

But for log files, atomic writes _are_ pretty-much guaranteed in
practice, by:

"If the O_APPEND flag of the file status flags is set, the file offset
shall be set to the end of the file prior to each write and no intervening
file modification operation shall occur between changing the file offset
and the write operation."

Or, in other words, if you're appending to a file, you'll always end
up writing to the end of the file, even if you're racing with another
write().  This assumes that libc or equivalent implements write()
by calling the underlying syscall only once, of course, but that's
generally true.

I think signals and resource limits are a red herring: we don't see
non-fatal signals in svnserve with any frequency, and resource limits
will hit every process anyway.

Regards,
Malcolm

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Return of the (svnserve) log

Posted by Greg Hudson <gh...@MIT.EDU>.

On Sun, 2005-10-30 at 12:45 -0600, Jonathan Gilbert wrote:
> At 10:48 AM 30/10/2005 -0500, Greg Hudson wrote:
> >At any rate, standard practice for logfiles is to open them in append
> >mode and write out each log message in a single write(), which is
> >guaranteed to be atomic.  If we do that, then multiple handles to the
> >same file due to symlinks should be a resource-consumption issue only,
> >not a correctness issue.
> 
> I don't see how this can guarantee what we need. While whatever a write()
> sends out is guaranteed to be atomic, the write() function is capable of
> doing partial writes, and there's no way to be certain that it won't do
> that.

I believe that for regular files on a local filesystem, write() does all
or nothing.  I don't have an authoritative source, since I didn't manage
to pull up either the SUS or POSIX standards on the web easily.

(Logging to a network filesystem is always going to be a little dodgy.
Not much we can do about that, I think.)

>  The only way I can see to be *absolutely sure* is to use an OS-level
> synchronization function. Since cross-process synchronization is likely
> more expensive that in-process synchronization, perhaps this should be
> selected at startup based on the user's choice of connection mode (threads
> vs. fork).
[...]
> Another minor issue is that the Linux man page for write() (section 2)
> indicates that there exist filesystems where write() doesn't even guarantee
> that space on the device has been reserved for the device, let alone that
> the data has been written. If the two file handles don't know about each
> other, then we also need to fsync() after every write(), and this creates a
> race condition in the absence of synchronization.

If a filesystem gives up on returning an error from write() on disk
full, then pretty much every application is sunk when it comes to
graceful filled disk recovery.  Subversion won't be unusual in that
regard.

If the filesystem has not actually written out the data, that doesn't
mean the kernel isn't ensuring proper append semantics.  POSIX requires
that when write() returns, the data is reflected by a subsequent read()
in any process, and local filesystems generally conform to that
constraint.

At any rate, locking or fsyncing for each log message would be a
performance killer, so even if there are edge cases for simply opening
for append and writing, I think we should do it anyway.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Return of the (svnserve) log

Posted by Jonathan Gilbert <o2...@sneakemail.com>.

At 10:48 AM 30/10/2005 -0500, Greg Hudson wrote:
>At any rate, standard practice for logfiles is to open them in append
>mode and write out each log message in a single write(), which is
>guaranteed to be atomic.  If we do that, then multiple handles to the
>same file due to symlinks should be a resource-consumption issue only,
>not a correctness issue.

I don't see how this can guarantee what we need. While whatever a write()
sends out is guaranteed to be atomic, the write() function is capable of
doing partial writes, and there's no way to be certain that it won't do
that. The only way I can see to be *absolutely sure* is to use an OS-level
synchronization function. Since cross-process synchronization is likely
more expensive that in-process synchronization, perhaps this should be
selected at startup based on the user's choice of connection mode (threads
vs. fork).

If I'm wrong, does anyone have an authoritative source of information that
guarantees that write() will either write all of its data or none?

Another minor issue is that the Linux man page for write() (section 2)
indicates that there exist filesystems where write() doesn't even guarantee
that space on the device has been reserved for the device, let alone that
the data has been written. If the two file handles don't know about each
other, then we also need to fsync() after every write(), and this creates a
race condition in the absence of synchronization.

Jonathan Gilbert

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Return of the (svnserve) log

Posted by Greg Hudson <gh...@MIT.EDU>.

On Sun, 2005-10-30 at 10:16 +0000, Max Bowsher wrote:
> Simply canonicalize, then strcmp the filenames to tell if they are the same 
> file, and then take an advisory write lock on the files to guard against the 
> server admin doing something stupid with symlinks.

In Unix systems, you can't lock against your own process, so we can't
guard against weird symlink cases that way.

At any rate, standard practice for logfiles is to open them in append
mode and write out each log message in a single write(), which is
guaranteed to be atomic.  If we do that, then multiple handles to the
same file due to symlinks should be a resource-consumption issue only,
not a correctness issue.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Return of the (svnserve) log

Posted by Max Bowsher <ma...@ukf.net>.

Jonathan Gilbert wrote:
> Hello,
>
> I've been using SVN for a while now and I realized suddenly, "svnserve is
> quite similar to httpd, so why doesn't it log events like an httpd does?"
> Actually, I've been subscribed to the dev list for a while, so the real
> thought was "Logging facilities for svnserve were discussed in July, so 
> why
> doesn't it *already* log events like an httpd does?" I brought up the 
> issue
> on the IRC channel and a discussion ensued in which I learned that Sussman
> had been planning to implement it, but only finished mod_dav_svn before 
> the
> "Big 1.3 Push". Before he could sink his teeth into the svnserve half of
> it, he woke up one morning and found himself an employee of Google instead
> of CollabNet :-) As a result, Sussman no longer has time to implement the
> logging facility for svnserve. I was thinking I might be up to the task 
> :-)
>
> I'm going to present a list of features that I thought would be good for a
> base implementation. I know there is a strong urge to K.I.S.S., but I will
> present a rationale for each of the features, and in any event, none of
> these features are particularly difficult to implement properly.

Excellent, a design proposal is _definitely_ the right place to start.

> The basic concept of logging is, as I started out saying, the same as that
> for a web server such as Apache. There are a number of classes of logs, of
> which at least two are "access" and "error" (to use the Apache
> terminology). The "access" log shows successful day-to-day transactions,
> such as checkout/export, commit, mkdir, list, what have you. The "error"
> log shows the same kind of actions, but indicates their failure. For
> instance, if I attempt to list a directory which doesn't exist, the log
> entry for that failed request goes into the "error" log, not the "access"
> log. Sussman also mentioned that a third class of error should also be
> implemented: "authorization", so that those events which directly pertain
> to security can be painlessly split off.
>
> What follows is the list of features that I feel are appropriate, with a
> brief explanation of why I think each one should be the way it is:
>
> ------------------------------------------------------------------------
> 1. Possible targets: syslogd, Windows Event Log service, and flat files.
>
> Sussman told me on IRC that the previous discussion of the topic had
> decided that flat files would probably be redundant, as syslogd can simply
> be configured to redirect the SVN log entries to their own file.
>
[snip reasoning I agree with]
> I believe it is important to support directly writing to flat files.

+1. An additional reason is to support logging from a svnserve daemon run by 
a non-root user, who won't be able to tweak syslog configuration.

> ------------------------------------------------------------------------
> 2. Classes of events: auth, access, and error.
>
> Different system administrators will be interested in different things.
> Some people are interested in knowing precisely who is communicating from
> where and what user they are purporting to be. Other people are interested
> to know what parts of their repository are being accessed the most.
> Probably a fair number of people are interested in being able to detect
> attacks on their server, which could take the form of denial of service.
> For these reasons, I believe it is important to divide logs up primarily
> into these 3 categories:
>
> - Auth events (Authentication & Authorization), which involve people
> identifying themselves to the server and requesting resources, would
> indicate what user account, if any, the user had provided, what IP address
> they were connecting from, and, most importantly, whether the attempt was
> successful. Failed authorization attempts (attempting to write when the
> repository is read-only) would also indicate which resource the user had
> attempted to access without authority.

Why failed only? "User X was granted access to something" isn't very useful.
It would probably also be appropriate to log the SASL mechanism in use.


> - Access events, which involve people successfully working with the server
> using day-to-day functions like "checkout", "commit", "list", "mkdir",
> would indicate which user & IP address the request had come from, what the
> request type was, and which resource the request involved. If there are
> some options which the server can discern for certain requests (such as
> perhaps a request for recursion), these should also be noted if they are
> available.

A first draft of the list of the events would then be simply the list of 
main commands, from the ra_svn protocol document.

> - Error events, which involve people who have successfully authenticated
> with the server asking it to perform an action it cannot or will not do,
> would indicate similar information as access events but also indicate the
> cause of the failure, perhaps through the use of a status code.
>
> Of course, not everyone will want error events split off from access
> events. Functionality for this is discussed in feature #4 below :-)
>
> ------------------------------------------------------------------------
> 3. Common file format for all plain text log data.
>
> While auth, access and error events do not log precisely the same sets of
> information, it should be (as discussed in feature #4 below) at least
> possible for an administrator to combine all log information into a single
> file. When web servers proliferated and came under widespread use, tools
> emerged for analyzing the log files produced by servers and providing
> statistics and other analyses. While svnserve is less likely to attract
> such tools, it doesn't cost us anything to at least use a common format
> when logging any class of event.

Hmm. Apache's access and error logs do not use a common format. I'm of the 
opinion that the intrinsically different nature of the events will prevent a 
fully common format. Instead, perhaps the first few fields on a line can be 
common, with the later fields being class-specific?

> In order to be human-readable, such a format should be plain ASCII text,
> similar in nature to a web server's logs (this is only an issue on 
> Windows,
> where the Event Log service allows arbitrary binary data to be logged). In
> order to be machine-parseable, the format should have a fixed number of
> fields delimited by spaces, and fields whose content could potentially
> contain spaces should (always) be enclosed in quotation marks, with some
> provision for escaping (I'm thinking of URLs here, primarily, and as auth
> events would not necessarily include a URL, the field could be encoded
> using two adjacent quotes in the file "").

I don't know exactly what apache does, but it might be worth checking up on 
that, and possibly following their example.

> The exact format of the log
> messages will depend on precisely which data is available, which is
> something I will determine when I review svnserve's architecture and the
> existing logging facility added to mod_dav_svn by Sussman. It will,
> however, most certainly include a date & timestamp.
>
> ------------------------------------------------------------------------
> 4. Config file syntax to allow multiple classes of events to be logged to
> the same flat file.
>
> As mentioned earlier, some people will inevitably wish to have a
> consolidated log format (myself, for instance), a mechanism to allow
> multiple logs to be directed at the same file is required. While this 
> could
> be done by simply requesting the same filename:
>
> error-log = access_and_error.log
> access-log = access_and_error.log
>
> .. the appearance of this in the config file raises questions in the 
> user's
> head: Will the server be smart enough to canonicalize the paths & check
> that they are the same file, or will it open the same file with two
> separate handles and completely mangle the resulting log data? In 
> addition,
> the implementation of this kind of checking is troublesome to say the 
> least.
>
> In order to simplify implementation and remove this dubious appearance 
> from
> the config file, I propose the following syntax:
>
> logfile-1 = access_and_error.log
> logfile-2 = auth.log
>
> error-log = file 1
> access-log = file 1
> auth-log = file 2
>
> The precise format of the right-hand-side of each "-log" entry would be to
> allow one of the following:
>
> "file N", to use the file indicated by the "logfile-N" directive,
> "syslog", to use the UNIX syslog facility (an error on unsupporting
> systems), "WindowsEventLog", to use the Windows API (an error when not on
> Windows).
>
> This syntax suggests a level of abstraction between the event sinks and 
> the
> output mechanisms, which I believe is the best way to implement the
> functionality.

No, *PLEASE*, no. The whole numbered logfile idea might make sense to you, 
having approached this from a "how shall I code this" perspective, but it 
feels painfully unintuitive from an outside perspective.

Simply canonicalize, then strcmp the filenames to tell if they are the same 
file, and then take an advisory write lock on the files to guard against the 
server admin doing something stupid with symlinks.

Then, put a commented out example in the default config file to demonstrate 
that this is possible.

> ------------------------------------------------------------------------
> 5. Configurable behaviour for failure to log an event.
>
> Some people are interested in logged information for important security
> reasons; they will see it as an audit trail. Other users of SVN, such as
> myself, will be interested purely for informational purposes.
>
> When an audit trail is being produced and the target device becomes full 
> or
> otherwise unable to accommodate a log entry, everything grinds to a
> terrifying halt, because it would be completely unacceptable to permit
> events to proceed without logging them when the administrator has
> specifically requested an audit trail. However, if the log information is
> not being considered a vital source of information about the behaviour
> patterns of those with access to the repository, it would be inappropriate
> to deny service in the event that actions cannot be logged.
>
> Therefore, I propose a property to be applied independently to classes of
> logs which makes that class guarantee an audit. Disabled by default, this
> property would make svnserve refuse to handle requests if it failed to log
> them.
>
> I propose the following name for the property in the config file:
>
> auth-log-auditing = on
> access-log-auditing = off
>
> If auditing is disabled and logging fails, I propose that svnserve first
> attempt to directly log the failure (not the event itself) to syslog, and
> if that fails, write it to stderr, which may or may not show up on the
> system's console. Understand that this is a last resort :-)

The config property name is not at all intuitive, but the rest of the 
discussion is good.
I'd suggest "fail-on-{access,auth}-log-failure" is more likely to allow 
someone seeing it without any prior specific knowledge to understand what it 
does.

I'm also not sure that writing to syslog (at least unconditionally) is the 
right thing to do. Without a configured facility and severity, who knows 
where the log event would go.

> These 5 features seem to me fundamental to any properly functional & 
> usable
> logging system for svnserve. If I've missed anything important, just let 
> me
> know, of course :-) I'm interested to hear everyone's thoughts on what 
> I've
> written here and on logging in svnserve.

One thing that might be worth (at least briefly) considering is: Can we 
borrow code from Apache to help us with this? (And then long term, feed that 
code back into APR(-util))

Max.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PROPOSAL] Return of the (svnserve) log

Posted by Erik Huelsmann <eh...@gmail.com>.

> As I see it, we can have either log files in the locale encoding of the
> server process, or always in UTF8. I like the latter, since it always can
> represent all characters (no fuzzy encodings), and it is easy to translate
> if you need to.

> Thoughts?

+1 on UTF-8


bye,

Erik.

Re: [PROPOSAL] Return of the (svnserve) log

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.

On Sat, 29 Oct 2005, Jonathan Gilbert wrote:

> In order to be human-readable, such a format should be plain ASCII text,
> similar in nature to a web server's logs (this is only an issue on Windows,

I'm not sure if you really mean the 7bit encoding ASCII whenr you write
"ASCII", or you just mean plain text.

Regardless, we need to think about the encoding of the log files, since it
will contain paths (and other strings) with non-ASCII characters. I
remember a discussion about encoding in the Apache log when that feature
was added right before 1.3 was branched. I don't know the details, but it
seems like Apache logs are limited to 7-bit characters (meaning the range
below 128). I would hate if we limited ourselves to this subset when
designing a new format.

As I see it, we can have either log files in the locale encoding of the
server process, or always in UTF8. I like the latter, since it always can
represent all characters (no fuzzy encodings), and it is easy to translate
if you need to.

Thoughts?

//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org