You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Bert Van Kets <be...@apache.org> on 2002/07/19 21:59:48 UTC
web server log file parsing
One of the projects I'm working on is using Cocoon to create a web server
log analysis tool. It's perfect for generating the graphs and reports in
different formats.
The one thing I'm struggling with is parsing the log files. There are
several ways of doing this:
1. reading the logs for every period request and create a sax stream
2. parse the unread log files into a database and use that to generate a
sax stream. This is a two stage system.
As log files can grow very large copying it to the database will give me
not only a large log directory, but also a large database. Once the data
is in the database the speed advantages are of course obvious.
Reading the data from the log files on every request does have a
disadvantage in processing speed, but does not use gigs of disk
space. Webtrends does it this way exactly for the same reason.
What is your guys opinion?
Bert
---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org
Re: web server log file parsing
Posted by Peter Royal <pr...@apache.org>.
On Friday 19 July 2002 03:59 pm, Bert Van Kets wrote:
> As log files can grow very large copying it to the database will give me
> not only a large log directory, but also a large database. Once the data
> is in the database the speed advantages are of course obvious.
> Reading the data from the log files on every request does have a
> disadvantage in processing speed, but does not use gigs of disk
> space. Webtrends does it this way exactly for the same reason.
>
> What is your guys opinion?
I'd parse on each request and use Cocoon caching!
Check out Chaperon, http://chaperon.sourceforge.net/. It is a text->XML
converter that already has cocoon support (as a generator) and is cacheable.
-pete
--
peter royal -> proyal@apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org
Re: web server log file parsing
Posted by Valentin Richter <Va...@raytion.com>.
At 21:59 Uhr +0200 19.07.2002, Bert Van Kets wrote:
>One of the projects I'm working on is using Cocoon to create a web server log analysis tool. It's perfect for generating the graphs and reports in different formats.
>The one thing I'm struggling with is parsing the log files. There are several ways of doing this:
>1. reading the logs for every period request and create a sax stream
>2. parse the unread log files into a database and use that to generate a sax stream. This is a two stage system.
>
>As log files can grow very large copying it to the database will give me not only a large log directory, but also a large database. Once the data is in the database the speed advantages are of course obvious.
>Reading the data from the log files on every request does have a disadvantage in processing speed, but does not use gigs of disk space. Webtrends does it this way exactly for the same reason.
>
>What is your guys opinion?
>
>Bert
Just to suggest an alternative approach: If your main concern is not developing another log analysis tool but to have nicely formated reports and using Cocoon to do the necessary work then you might use Analog for analysing the log files and feed its output to Cocoon.
Analog
http://www.analog.cx/
has a special parameter for producing its results in a special computer friendly format, i.e.
OUTPUT COMPUTER
which produces a text file with comma or tab separated values. See
http://www.analog.cx/docs/output.html
and
http://www.analog.cx/docs/compout.html
for details. Unfortunately, there seems to be no XML output option available.
Although Analog is blazingly fast (if you disable DNS lookups) I would still recommend not to run it for each request. One strategy which works for us, is to rotate the logfiles daily at midnight, compress them using gzip, run Analog on the compressed files (which is by the way by a factor of 2 to 4 _faster_ than with uncompressed files, probably because it needs only 10th of the number of disc accesses) and put the results into a database.
Valentin Richter
Raytion GmbH
Kaiser-Friedrich-Ring 74
40547 Düsseldorf
Germany
mailto:valentin.richter@raytion.com
http://www.raytion.com
---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org