You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@httpd.apache.org by Fredrik Steen <ap...@stone.nu> on 2005/06/02 11:13:52 UTC

[users@httpd] Apache 2.x remote logging

I'm in the process of building a web farm consisting of 10-20-30 something web
servers handling 300k of virtual hosts and have started to look at how to
handle logging for these hosts. 

Do anyone have any good ideas how to handle that massive amount of log files?
Preferably using some sort of remote log collector.

I have looked at: 
 - mod-witch: http://savannah.nongnu.org/projects/mod-witch (Will syslog handle it?)
 - mod_log_spread: http://www.backhand.org/mod_log_spread/ (Apache 2.x module
   status?)
 - mod_log_sql: http://www.outoforder.cc/projects/apache/mod_log_sql/
   (Performance?)

Any recommendations or ideas to handle this?

--
Thanks, Fredrik Steen





---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


[users@httpd] Re: Apache 2.x remote logging

Posted by Fredrik Steen <ap...@stone.nu>.
Vizion <vi...@vizion.occoxmail.com> writes:

> On Thursday 02 June 2005 05:15,  the author Fredrik Steen contributed to the 
> dialogue on-
>  [users@httpd]  Re: Apache 2.x remote logging: 
>
>>Vizion <vi...@vizion.occoxmail.com> writes:
>>> On Thursday 02 June 2005 02:13,  the author Fredrik Steen contributed to
>>> the dialogue on-
>>>
>>>  [users@httpd]  Apache 2.x remote logging:
>>>>I'm in the process of building a web farm consisting of 10-20-30 something
>>>> web servers handling 300k of virtual hosts and have started to look at
>>>> how to handle logging for these hosts.
>>>>
>>>>Do anyone have any good ideas how to handle that massive amount of log
>>>> files? Preferably using some sort of remote log collector.
>>>>
>>>>I have looked at:
>>>> - mod-witch: http://savannah.nongnu.org/projects/mod-witch (Will syslog
>>>> handle it?) - mod_log_spread: http://www.backhand.org/mod_log_spread/
>>>> (Apache 2.x module status?)
>>>> - mod_log_sql: http://www.outoforder.cc/projects/apache/mod_log_sql/
>>>>   (Performance?)
>>>>
>>>>Any recommendations or ideas to handle this?
>>>
>>> With such a massive logging requirement I feel some analysis of the
>>> purpose to you intend to fulfill by using the logs could well drive the
>>> selection of methods & processes for dealing with them and hence and your
>>> choice of solution.
>>> Are there any top level plans which describes each purpose and some idea
>>> of data volumes applicable for each purpose you feel able to share with
>>> us?
>>
>>The idea is to use the logs for generating statistics (daily) for each
>> virtual host. 
> In that case the virtual host http access data could remain on each machine 
> and you would have no need to use use a central log collector for them. 


All the virtual hosts is shared between all of the web servers and then load
balanced so every web server will have log file(s) for every virtual host (I
should of course have informed that in my previous mail, sorry)

The most straight forward thing to do would be to collect the log file(s) from
the web servers (split), merge, sort and generate statistics for each.

When thinking of it I assumed that this was a common problem for larger setups
and that there was some nice solutions for it. But searching mailing lists and
the web it seems that it was not the case (at least for Apache 2.x). 

I appreciate your help.


> That seems the simple approach. You might want to consider automatically
> generationg a summary report to each user every week with a link to the raw
> data and tell them it wll be held for X days and thereafter deleted. that
> way you you get remve archival demands and provide an appreciated
> service. If the summary dile is designed so that it incorporates all the
> stats you need then a copy of that email you be cc'd to a system which
> amagamtes the data and produces your own summary statistics. If that is
> workable for you it would mean you do not need to add any more complexity to
> your infrastructure.  My two pennorth
>
>> We will use webalizer[1] and/or awstats[2] for presentation. 
>> The logs will be saved for X months and then deleted. That's the plan.


Thanks. 

-- 
.Fredrik Steen


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Re: Apache 2.x remote logging

Posted by Vizion <vi...@vizion.occoxmail.com>.
On Thursday 02 June 2005 05:15,  the author Fredrik Steen contributed to the 
dialogue on-
 [users@httpd]  Re: Apache 2.x remote logging: 

>Vizion <vi...@vizion.occoxmail.com> writes:
>> On Thursday 02 June 2005 02:13,  the author Fredrik Steen contributed to
>> the dialogue on-
>>
>>  [users@httpd]  Apache 2.x remote logging:
>>>I'm in the process of building a web farm consisting of 10-20-30 something
>>> web servers handling 300k of virtual hosts and have started to look at
>>> how to handle logging for these hosts.
>>>
>>>Do anyone have any good ideas how to handle that massive amount of log
>>> files? Preferably using some sort of remote log collector.
>>>
>>>I have looked at:
>>> - mod-witch: http://savannah.nongnu.org/projects/mod-witch (Will syslog
>>> handle it?) - mod_log_spread: http://www.backhand.org/mod_log_spread/
>>> (Apache 2.x module status?)
>>> - mod_log_sql: http://www.outoforder.cc/projects/apache/mod_log_sql/
>>>   (Performance?)
>>>
>>>Any recommendations or ideas to handle this?
>>
>> With such a massive logging requirement I feel some analysis of the
>> purpose to you intend to fulfill by using the logs could well drive the
>> selection of methods & processes for dealing with them and hence and your
>> choice of solution.
>> Are there any top level plans which describes each purpose and some idea
>> of data volumes applicable for each purpose you feel able to share with
>> us?
>
>The idea is to use the logs for generating statistics (daily) for each
> virtual host. 
In that case the virtual host http access data could remain on each machine 
and you would have no need to use use a central log collector for them. That 
seems the simple approach. You might want to consider automatically 
generationg a summary report to each user every week with a link to the raw 
data and tell them it wll be held for X days and thereafter deleted. that way 
you you get remve archival demands and provide an appreciated service. If the 
summary dile is designed so that it incorporates all the stats you need then 
a copy of that email you be cc'd to a system which amagamtes the data and 
produces your own summary statistics. If that is workable for you it would 
mean you do not need to add any more complexity to your infrastructure.
My two pennorth

> We will use webalizer[1] and/or awstats[2] for presentation. 
> The logs will be saved for X months and then deleted. That's the plan.
>

-- 
40 yrs navigating and computing in blue waters.
English Owner & Captain of British Registered 60' bluewater Ketch S/V Taurus.
 Currently in San Diego, CA. Sailing May/June bound for Europe via Panama 
Canal.

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


[users@httpd] Re: Apache 2.x remote logging

Posted by Fredrik Steen <ap...@stone.nu>.
Vizion <vi...@vizion.occoxmail.com> writes:

> On Thursday 02 June 2005 02:13,  the author Fredrik Steen contributed to the 
> dialogue on-
>  [users@httpd]  Apache 2.x remote logging: 
>
>>I'm in the process of building a web farm consisting of 10-20-30 something
>> web servers handling 300k of virtual hosts and have started to look at how
>> to handle logging for these hosts.
>>
>>Do anyone have any good ideas how to handle that massive amount of log
>> files? Preferably using some sort of remote log collector.
>>
>>I have looked at:
>> - mod-witch: http://savannah.nongnu.org/projects/mod-witch (Will syslog
>> handle it?) - mod_log_spread: http://www.backhand.org/mod_log_spread/
>> (Apache 2.x module status?)
>> - mod_log_sql: http://www.outoforder.cc/projects/apache/mod_log_sql/
>>   (Performance?)
>>
>>Any recommendations or ideas to handle this?
> With such a massive logging requirement I feel some analysis of the purpose to 
> you intend to fulfill by using the logs could well drive the selection of 
> methods & processes for dealing with them and hence and your choice of 
> solution. 
> Are there any top level plans which describes each purpose and some idea of 
> data volumes applicable for each purpose you feel able to share with us? 

The idea is to use the logs for generating statistics (daily) for each virtual
host. We will use webalizer[1] and/or awstats[2] for presentation. The logs
will be saved for X months and then deleted. That's the plan.

[1] http://www.mrunix.net/webalizer/
[2] http://awstats.sourceforge.net/

--
Thanks, Fredrik Steen


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Apache 2.x remote logging

Posted by Vizion <vi...@vizion.occoxmail.com>.
On Thursday 02 June 2005 02:13,  the author Fredrik Steen contributed to the 
dialogue on-
 [users@httpd]  Apache 2.x remote logging: 

>I'm in the process of building a web farm consisting of 10-20-30 something
> web servers handling 300k of virtual hosts and have started to look at how
> to handle logging for these hosts.
>
>Do anyone have any good ideas how to handle that massive amount of log
> files? Preferably using some sort of remote log collector.
>
>I have looked at:
> - mod-witch: http://savannah.nongnu.org/projects/mod-witch (Will syslog
> handle it?) - mod_log_spread: http://www.backhand.org/mod_log_spread/
> (Apache 2.x module status?)
> - mod_log_sql: http://www.outoforder.cc/projects/apache/mod_log_sql/
>   (Performance?)
>
>Any recommendations or ideas to handle this?
With such a massive logging requirement I feel some analysis of the purpose to 
you intend to fulfill by using the logs could well drive the selection of 
methods & processes for dealing with them and hence and your choice of 
solution. 
Are there any top level plans which describes each purpose and some idea of 
data volumes applicable for each purpose you feel able to share with us? 

To give you an example of why I ask this question some of your logging data 
may simply need to be analyzed and the raw data dumped. For such a purpose 
you have a choice of either local analysis and cyclical dumping of raw data 
with forwarding of processed results to a central collector or, in the 
alternative, raw data forwarding and central or departmental analysis. The 
choice of technique may depend upon the usage point as well as considerations 
of band width, processing and storage. (For example do you need to have a 
centralized record of every http access to each of your virtual hosts files  
- or is that data only needed by the virtual host but you may need a 
centralized summary for statistical and marketing purposes).

The sharing of a plan would help oil observations.
David



-- 
40 yrs navigating and computing in blue waters.
English Owner & Captain of British Registered 60' bluewater Ketch S/V Taurus.
 Currently in San Diego, CA. Sailing May/June bound for Europe via Panama 
Canal.

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org