You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cocoon.apache.org by Bert Van Kets <be...@apache.org> on 2002/07/19 21:59:48 UTC

web server log file parsing

One of the projects I'm working on is using Cocoon to create a web server 
log analysis tool.  It's perfect for generating the graphs and reports in 
different formats.
The one thing I'm struggling with is parsing the log files.  There are 
several ways of doing this:
1. reading the logs for every period request and create a sax stream
2. parse the unread log files into a database and use that to generate a 
sax stream.  This is a two stage system.

As log files can grow very large copying it to the database will give me 
not only a large log directory, but also a large database.  Once the data 
is in the database the speed advantages are of course obvious.
Reading the data from the log files on every request does have a 
disadvantage in  processing speed, but does not use gigs of disk 
space.  Webtrends does it this way exactly for the same reason.

What is your guys opinion?

Bert



---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org

Re: web server log file parsing

Posted by Peter Royal <pr...@apache.org>.

On Friday 19 July 2002 03:59 pm, Bert Van Kets wrote:
> As log files can grow very large copying it to the database will give me
> not only a large log directory, but also a large database.  Once the data
> is in the database the speed advantages are of course obvious.
> Reading the data from the log files on every request does have a
> disadvantage in  processing speed, but does not use gigs of disk
> space.  Webtrends does it this way exactly for the same reason.
>
> What is your guys opinion?

I'd parse on each request and use Cocoon caching!

Check out Chaperon, http://chaperon.sourceforge.net/. It is a text->XML 
converter that already has cocoon support (as a generator) and is cacheable.
-pete

-- 
peter royal -> proyal@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org

Re: web server log file parsing

Posted by Valentin Richter <Va...@raytion.com>.

At 21:59 Uhr +0200 19.07.2002, Bert Van Kets wrote:
>One of the projects I'm working on is using Cocoon to create a web server log analysis tool.  It's perfect for generating the graphs and reports in different formats.
>The one thing I'm struggling with is parsing the log files.  There are several ways of doing this:
>1. reading the logs for every period request and create a sax stream
>2. parse the unread log files into a database and use that to generate a sax stream.  This is a two stage system.
>
>As log files can grow very large copying it to the database will give me not only a large log directory, but also a large database.  Once the data is in the database the speed advantages are of course obvious.
>Reading the data from the log files on every request does have a disadvantage in  processing speed, but does not use gigs of disk space.  Webtrends does it this way exactly for the same reason.
>
>What is your guys opinion?
>
>Bert

Just to suggest an alternative approach: If your main concern is not developing another log analysis tool but to have nicely formated reports and using Cocoon to do the necessary work then you might use Analog for analysing the log files and feed its output to Cocoon.

Analog

  http://www.analog.cx/

has a special parameter for producing its results in a special computer friendly format, i.e.

   OUTPUT COMPUTER

which produces a text file with comma or tab separated values. See

   http://www.analog.cx/docs/output.html

and

   http://www.analog.cx/docs/compout.html

for details. Unfortunately, there seems to be no XML output option available.

Although Analog is blazingly fast (if you disable DNS lookups) I would still recommend not to run it for each request. One strategy which works for us, is to rotate the logfiles daily at midnight, compress them using gzip, run Analog on the compressed files (which is by the way by a factor of 2 to 4 _faster_ than with uncompressed files, probably because it needs only 10th of the number of disc accesses) and put the results into a database.

Valentin Richter

Raytion GmbH
Kaiser-Friedrich-Ring 74
40547 Düsseldorf
Germany
mailto:valentin.richter@raytion.com
http://www.raytion.com

---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org