You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by Martin Kouba <ma...@symbiont-it.cz> on 2011/05/24 13:50:05 UTC
CrawlerSessionManagerValve question
What is the reason NOT to assume that request with more than one
User-Agent header originates from a bot?
See lines 133, 134 in Tomcat 7.0.14.
Thanks
Martin
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org
Re: CrawlerSessionManagerValve question
Posted by André Warnier <aw...@ice-sa.com>.
Mark Thomas wrote:
> On 24/05/2011 12:50, Martin Kouba wrote:
>> What is the reason NOT to assume that request with more than one
>> User-Agent header originates from a bot?
>> See lines 133, 134 in Tomcat 7.0.14.
>
> Simply that none of the samples I looked at had multiple UA headers and
> a suggestion from another committer that skipping those requests might
> be a way to save a few cycles.
>
> If you have traces that show multiple headers, I'd be interested in
> seeing them.
>
From the RFC police :
RFC 2616, 4.2 Message Headers :
Multiple message-header fields with the same field-name MAY be present in a message if and
only if the entire field-value for that header field is defined as a comma-separated list
[i.e., #(values)].
(note the "if and only")
RFC 2616, 14.43 User-Agent
User-Agent = "User-Agent" ":" 1*( product | comment )
(so *not* defined as '#(values)')
==> (my interpretation) : multiple User-Agent headers are invalid.
Discussion :
14.43 otherwise says :
The field can contain multiple product tokens (section 3.8) and comments identifying the
agent and any subproducts which form a significant part of the user agent. By convention,
the product tokens are listed in order of their significance for identifying the application.
and 4.2 otherwise says :
It MUST be possible to combine the multiple header fields into one "field-name:
field-value" pair, without changing the semantics of the message, by appending each
subsequent field-value to the first, each separated by a comma. The order in which header
fields with the same field-name are received is therefore significant to the
interpretation of the combined field value, and thus a proxy MUST NOT change the order of
these field values when a message is forwarded.
Thus, if one were to accept multiple User-Agent headers, and combine them as a
comma-separated list, one would then have trouble respecting the "order of their
significance" as expressed in 14.43.
So it makes sense to allow only one User-Agent header.
And maybe the "lines 133, 134 in Tomcat 7.0.14" should be modified to reject the request
if it has more than one such ?
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org
Re: CrawlerSessionManagerValve question
Posted by Mark Thomas <ma...@apache.org>.
On 24/05/2011 12:50, Martin Kouba wrote:
> What is the reason NOT to assume that request with more than one
> User-Agent header originates from a bot?
> See lines 133, 134 in Tomcat 7.0.14.
Simply that none of the samples I looked at had multiple UA headers and
a suggestion from another committer that skipping those requests might
be a way to save a few cycles.
If you have traces that show multiple headers, I'd be interested in
seeing them.
Cheers,
Mark
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org