You are viewing a plain text version of this content. The canonical link for it is here.
Posted to cvs@httpd.apache.org by Paul Sutton <pc...@hyperreal.com> on 1996/12/20 17:13:16 UTC
cvs commit: apache/htdocs/manual content-negotiation.html
pcs 96/12/20 08:13:15
Modified: htdocs/manual content-negotiation.html
Log:
Expand documentation of content negotiation for Apache 1.2 including
HTTP/1.1 stuff. Document the algorithm apache uses to choose a variant.
Revision Changes Path
1.5 +304 -97 apache/htdocs/manual/content-negotiation.html
Index: content-negotiation.html
===================================================================
RCS file: /export/home/cvs/apache/htdocs/manual/content-negotiation.html,v
retrieving revision 1.4
retrieving revision 1.5
diff -C3 -r1.4 -r1.5
*** content-negotiation.html 1996/12/12 01:09:38 1.4
--- content-negotiation.html 1996/12/20 16:13:14 1.5
***************
*** 1,57 ****
! <html>
! <head>
! <title>Apache server Content arbitration: MultiViews and *.var files</title>
! </head>
! <body>
<!--#include virtual="header.html" -->
! <h1>Content Arbitration: MultiViews and *.var files</h1>
! The HTTP standard allows clients (i.e., browsers like Mosaic or
! Netscape) to specify what data formats they are prepared to accept.
! The intention is that when information is available in multiple
! variants (e.g., in different data formats), servers can use this
! information to decide which variant to send. This feature has been
! supported in the CERN server for a while, and while it is not yet
! supported in the NCSA server, it is likely to assume a new importance
! in light of the emergence of HTML3 capable browsers. <p>
!
! The Apache module <A HREF="mod/mod_negotiation.html">mod_negotiation</A> handles
! content negotiation in two different ways; special treatment for the
! pseudo-mime-type <code>application/x-type-map</code>, and the
! MultiViews per-directory Option (which can be set in srm.conf, or in
! .htaccess files, as usual). These features are alternate user
! interfaces to what amounts to the same piece of code (in the new file
! <code>http_mime_db.c</code>) which implements the content negotiation
! portion of the HTTP protocol. <p>
!
! Each of these features allows one of several files to satisfy a
! request, based on what the client says it's willing to accept; the
! differences are in the way the files are identified:
<ul>
! <li> A type map (i.e., a <code>*.var</code> file) names the files
! containing the variants explicitly
! <li> In a MultiViews search, the server does an implicit filename
! pattern match, and chooses from among the results.
</ul>
! Apache also supports a new pseudo-MIME type,
! text/x-server-parsed-html3, which is treated as text/html;level=3
! for purposes of content negotiation, and as server-side-included HTML
! elsewhere.
!
! <h3>Type maps (*.var files)</h3>
!
! A type map is a document which is typed by the server (using its
! normal suffix-based mechanisms) as
! <code>application/x-type-map</code>. Note that to use this feature,
! you've got to have an <code>AddType</code> some place which defines a
! file suffix as <code>application/x-type-map</code>; the easiest thing
! may be to stick a
<pre>
! AddType application/x-type-map var
</pre>
in <code>srm.conf</code>. See comments in the sample config files for
--- 1,96 ----
! <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
! <HTML>
! <HEAD>
! <TITLE>Apache Content Negotiation</TITLE>
! </HEAD>
! <BODY>
<!--#include virtual="header.html" -->
! <h1>Content Negotiation</h1>
! Apache's support for content negotiation has been updated to meet the
! HTTP/1.1 specification. It can choose the best representation of a
! resource based on the browser-supplied preferences for media type,
! languages, character set and encoding. It is also implements a
! couple of features to give more intelligent handling of requests from
! browsers which send incomplete negotiation information. <p>
!
! Content negotiation is provided by the
! <a href="mod/mod_negotiation.html">mod_negotiation</a> module,
! which is compiled in by default.
!
! <hr>
!
! <h2>About Content Negotiation</h2>
!
! A resource may be available in several different representations. For
! example, it might be available in different languages or different
! media types, or a combination. One way of selecting the most
! appropriate choice is to give the user an index page, and let them
! select. However it is often possible for the server to choose
! automatically. This works because browsers can send as part of each
! request information about what representations they prefer. For
! example, a browser could indicate that it would like to see
! information in French, if possible, else English will do. Browsers
! indicate their preferences by headers in the request. To request only
! French representations, the browser would send
!
! <pre>
! Accept-Language: fr
! </pre>
!
! Note that this preference will only be applied when there is a choice
! of representations and they vary by language.
! <p>
!
! As an example of a more complex request, this browser has been
! configured to accept French and English, but prefer French, and to
! accept various media types, preferring HTML over plain text or other
! text types, and prefering GIF or jpeg over other media types, but also
! allowing any other media type as a last resort:
!
! <pre>
! Accept-Language: fr; q=1.0, en; q=0.5
! Accept: text/html; q=1.0, text/*; q=0.8, image/gif; q=0.6,
! image/jpeg; q=0.6, image/*; q=0.5, */*; q=0.1
! </pre>
!
! Apache 1.2 supports 'server driven' content negotiation, as defined in
! the HTTP/1.1 specification. It fully supports the Accept,
! Accept-Language, Accept-Charset and Accept-Encoding request headers.
! <p>
!
! The terms used in content negotiation are: a <b>resource</b> is an
! item which can be requested of a server, which might be selected as
! the result of a content negotiation algorithm. If a resource is
! available in several formats, these are called <b>representations</b>
! or <b>variants</b>. The ways in which the variants for a particular
! resource vary are called the <b>dimensions</b> of negotiation.
!
! <h2>Negotiation in Apache</h2>
!
! In order to negotiate a resource, the server needs to be given
! information about each of the variants. This is done in one of two
! ways:
<ul>
! <li> Using a type map (i.e., a <code>*.var</code> file) which
! names the files containing the variants explicitly
! <li> Or using a 'MultiViews' search, where the server does an implicit
! filename pattern match, and chooses from among the results.
</ul>
! <h3>Using a type-map file</h3>
!
! A type map is a document which is associated with the handler
! named <code>type-map</code> (or, for backwards-compatibility with
! older Apache configurations, the mime type
! <code>application/x-type-map</code>). Note that to use this feature,
! you've got to have an <code>SetHanlder</code> some place which defines a
! file suffix as <code>type-map</code>; this is best done with a
<pre>
! AddHandler type-map var
</pre>
in <code>srm.conf</code>. See comments in the sample config files for
***************
*** 61,85 ****
consist of contiguous RFC822-format header lines. Entries for
different variants are separated by blank lines. Blank lines are
illegal within an entry. It is conventional to begin a map file with
! an entry for the combined entity as a whole, e.g.,
<pre>
! URI: foo; vary="type,language"
URI: foo.en.html
! Content-type: text/html; level=2
Content-language: en
! URI: foo.fr.html
! Content-type: text/html; level=2
! Content-language: fr
!
</pre>
- If the variants have different qualities, that may be indicated by the
- "qs" parameter, as in this picture (available as jpeg, gif, or ASCII-art):
- <pre>
! URI: foo; vary="type,language"
URI: foo.jpeg
Content-type: image/jpeg; qs=0.8
--- 100,126 ----
consist of contiguous RFC822-format header lines. Entries for
different variants are separated by blank lines. Blank lines are
illegal within an entry. It is conventional to begin a map file with
! an entry for the combined entity as a whole (although this
! is not required, and if present will be ignored). An example
! map file is:
<pre>
! URI: foo
URI: foo.en.html
! Content-type: text/html
Content-language: en
! URI: foo.fr.de.html
! Content-type: text/html; charset=iso-8859-2
! Content-language: fr, de
</pre>
! If the variants have different source qualities, that may be indicated
! by the "qs" parameter to the media type, as in this picture (available
! as jpeg, gif, or ASCII-art):
! <pre>
! URI: foo
URI: foo.jpeg
Content-type: image/jpeg; qs=0.8
***************
*** 90,96 ****
URI: foo.txt
Content-type: text/plain; qs=0.01
! </pre><p>
The full list of headers recognized is:
--- 131,142 ----
URI: foo.txt
Content-type: text/plain; qs=0.01
! </pre>
! <p>
!
! qs values can vary between 0.000 and 1.000. Note that any variant with
! a qs value of 0.000 will never be chosen. Variants with no 'qs'
! parameter value are given a qs factor of 1.0. <p>
The full list of headers recognized is:
***************
*** 103,114 ****
client would be granted access if they were to be requested
directly.
<dt> <code>Content-type:</code>
! <dd> media type --- level may be specified, along with "qs". These
are often referred to as MIME types; typical media types are
<code>image/gif</code>, <code>text/plain</code>, or
<code>text/html; level=3</code>.
<dt> <code>Content-language:</code>
! <dd> The language of the variant, specified as an Internet standard
language code (e.g., <code>en</code> for English,
<code>kr</code> for Korean, etc.).
<dt> <code>Content-encoding:</code>
--- 149,160 ----
client would be granted access if they were to be requested
directly.
<dt> <code>Content-type:</code>
! <dd> media type --- charset, level and "qs" parameters may be given. These
are often referred to as MIME types; typical media types are
<code>image/gif</code>, <code>text/plain</code>, or
<code>text/html; level=3</code>.
<dt> <code>Content-language:</code>
! <dd> The languages of the variant, specified as an internet standard
language code (e.g., <code>en</code> for English,
<code>kr</code> for Korean, etc.).
<dt> <code>Content-encoding:</code>
***************
*** 139,145 ****
The effect of <code>MultiViews</code> is as follows: if the server
receives a request for <code>/some/dir/foo</code>, if
<code>/some/dir</code> has <code>MultiViews</code> enabled, and
! <code>/some/dir/foo</code> does *not* exist, then the server reads the
directory looking for files named foo.*, and effectively fakes up a
type map which names all those files, assigning them the same media
types and content-encodings it would have if the client had asked for
--- 185,191 ----
The effect of <code>MultiViews</code> is as follows: if the server
receives a request for <code>/some/dir/foo</code>, if
<code>/some/dir</code> has <code>MultiViews</code> enabled, and
! <code>/some/dir/foo</code> does <em>not</em> exist, then the server reads the
directory looking for files named foo.*, and effectively fakes up a
type map which names all those files, assigning them the same media
types and content-encodings it would have if the client had asked for
***************
*** 161,213 ****
<p>
! If one of the files found by the globbing is a CGI script, it's not
! obvious what should happen. My code gives that case gets special
! treatment --- if the request was a POST, or a GET with QUERY_ARGS or
! PATH_INFO, the script is given an extremely high quality rating, and
! generally invoked; otherwise it is given an extremely low quality
! rating, which generally causes one of the other views (if any) to be
! retrieved. This is the only jiggering of quality ratings done by the
! MultiViews code; aside from that, all Qualities in the synthesized
! type maps are 1.0.
<p>
! <B>New as of 0.8:</B> Documents in multiple languages can also be resolved through the use
! of the <code>AddLanguage</code> and <code>LanguagePriority</code>
! directives:
<pre>
! AddLanguage en .en
! AddLanguage fr .fr
! AddLanguage de .de
! AddLanguage da .da
! AddLanguage el .el
! AddLanguage it .it
!
! # LanguagePriority allows you to give precedence to some languages
! # in case of a tie during content negotiation.
! # Just list the languages in decreasing order of preference.
! LanguagePriority en fr de
</pre>
! Here, a request for "foo.html" matched against "foo.html.en" and
! "foo.html.fr" would return an French document to a browser that
! indicated a preference for French, or an English document otherwise.
! In fact, a request for "foo" matched against "foo.html.en",
! "foo.html.fr", "foo.ps.en", "foo.pdf.de", and "foo.txt.it" would do
! just what you expect - treat those suffices as a database and compare
! the request to it, returning the best match. The languages and data
! types share the same suffix name space.
<p>
! Note that this machinery only comes into play if the file which the
! user attempted to retrieve does <em>not</em> exist by that name; if it
! does, it is simply retrieved as usual. (So, someone who actually asks
! for <code>foo.jpeg</code>, as opposed to <code>foo</code>, never gets
! <code>foo.gif</code>).
<!--#include virtual="footer.html" -->
! </body> </html>
--- 207,420 ----
<p>
! If one of the files found when reading the directive is a CGI script,
! it's not obvious what should happen. The code gives that case
! special treatment --- if the request was a POST, or a GET with
! QUERY_ARGS or PATH_INFO, the script is given an extremely high quality
! rating, and generally invoked; otherwise it is given an extremely low
! quality rating, which generally causes one of the other views (if any)
! to be retrieved.
!
! <h2>The Negotiation Algorithm</h2>
!
! After Apache has obtained a list of the variants for a given resource,
! either from a type-map file or from the filenames in the directory, it
! applies a algorithm to decide on the 'best' variant to return, if
! any. To do this it calculates a quality value for each variant in each
! of the dimensions of variance. It is not necessary to know any of the
! details of how negotaion actually takes place in order to use Apache's
! content negotation features. However the rest of this document
! explains in detail the algorithm used for those interested. <p>
!
! In some circumstances, Apache can 'fiddle' the quality factor of a
! particular dimension to achive a better result. The ways Apache can
! fiddle quality factors is explained in more detail below.
!
! <h3>Dimensions of Negotation</h3>
!
! <table>
! <tr><th>Dimension
! <th>Notes
! <tr><td>Media Type
! <td>Browser indicates preferences on Accept: header. Each item
! can have an associate quality factor. Variant description can also
! have a quality factor.
! <tr><td>Language
! <td>Browser indicates preferneces on Accept-Language: header. Each
! item
! can have a quality factor. Variants can be associated with none, one
! or more languages.
! <tr><td>Encoding
! <td>Browser indicates preference with Accept-Encoding: header.
! <tr><td>Charset
! <td>Browser indicates preference with Accept-Charset: header. Variant
! can indicate a charset as a parameter of the media type.
! </table>
!
! <h3>Apache Negotiation Algorithm</h3>
!
! Apache uses an algorithm to select the 'best' variant (if any) to
! return to the browser. This algorithm is not configurable. It operates
! like this:
! <p>
+ <ol>
+ <li>
+ Firstly, for each dimension of the negotiation, the appropriate
+ Accept header is checked and a quality assigned to this each
+ variant. If the Accept header for any dimension means that this
+ variant is not acceptable, eliminate it. If no variants remain, go
+ to step 4.
+
+ <li>Select the 'best' variant by a process of elimination. Each of
+ the following tests is applied in order. Any variants not selected at
+ each stage are eliminated. After each test, if only one variant
+ remains, it is selected as the best match. If more than one variant
+ remains, move onto the next test.
+
+ <ol>
+ <li>Multiply the quality factor from the Accept header with the
+ quality-of-source factor for this variant's media type, and select
+ the variants with the highest value
+
+ <li>Select the variants with the highest language quality factor
+
+ <li>Select the variants with the best language match, using either the
+ order of languages on the LanguagePriority directive (if present),
+ else the order of languages on the Accept-Language header.
+
+ <li>Select the variants with the highest 'level' media parameter
+ (used to give the version of text/html media types).
+
+ <li>Select only unencoded variants, if there is a mix of encoded
+ and non-encoded variants. If either all variants are encoded
+ or all variants are not encoded, select all.
+
+ <li>Select only variants with acceptable charset media parameters,
+ as given on the Accept-Charset header line. Charset ISO-8859-1
+ is always acceptable. Variants not associated with a particular
+ charset are assumed to be in ISO-8859-1.
+
+ <li>Select the variants with the smallest content length
+
+ <li>Select the first variant of those remaining (this will be either the
+ first listed in the type-map file, or the first read from the directory)
+ and go to stage 3.
+
+ </ol>
+
+ <li>The algorithm has now select one 'best' variant, so return
+ it as the response. The HTTP header Vary is set to indicate the
+ dimensions of negotations (browsers and caches can use this
+ information when caching the resource). End.
+
+ <li>To get here means no variant was selected (because non are acceptable
+ to the browser. Return a 406 status (meaning "No acceptable representation")
+ with a response body consisting of an HTML document listing the
+ available variants. Also set the HTTP Vary header to indicate the
+ dimensions of variance.
+
+ </ol>
+ <h2><a name="better">Fiddling with Quality Values</a></h2>
+
+ Apache sometimes changes the quality values from what would be
+ expected by a strict interpretation of the algorithm above. This is to
+ get a netter result from the algorithm for browsers which do not send
+ full or accurate information. Some of the most popular browsers send
+ Accept header information which would otherwise result in the
+ selection of the wrong variant in many cases. If a browser
+ sends full and correct information these fiddles will not
+ be applied.
<p>
! <h3>Media Types and Wildcards</h3>
+ The Accept: request header indicates preferneces for media types. It
+ can also include 'wildcard' media types, such as "image/*" or "*/*"
+ where the * matches any string. So a request including:
<pre>
! Accept: image/*, */*
! </pre>
! would indicate that any type starting "image/" would be acceptable,
! as would any other type (so the first "image/*" is redundant). Some
! browsers routinly send wildcards in addition to explicit types they
! can handle. For example:
! <pre>
! Accept: text/html, text/plain, image/gif, image/jpeg, */*
</pre>
! The intention of this result is to indicate that the explicitly
! listed types are preferred, but if a different representation is
! available, that is ok too. However under the basic algoryth, as given
! above, the */* wildcard has exactly equal preference to all the other
! types, so they are not being preferred. The browser should really have
! sent a request with a lower quality (preference) value for *.*, such
! as:
! <pre>
! Accept: text/html, text/plain, image/gif, image/jpeg, */*; q=0.01
! </pre>
+ The explicit types have no quality factor, so they default to a
+ preference of 1.0 (the highest). The wildcard */* is given
+ a low preference of 0.01, so other types will only be returned if
+ no variant matches an explicitly listed type.
<p>
! If the Accept: header contains <i>no</i> q factors at all, Apache sets
! the q value of "*/*", if present, to 0.01 to emulate the desired
! behaviour. It also sets the q value of wildcards of the format
! "type/*" to 0.02 (so these are preferred over matches against
! "*/*". If any media type on the Accept: header contains a q factor,
! these special values are <i>not</i> applied, so requests from browsers
! which send the correct information to start with work as expected.
!
! <h3>Variants with no Language</h3>
!
! If some of the variants for a particular resource have a language
! attribute, and some do not, those variants with no language
! are given a very low language quality factor of 0.001.<p>
!
! The reason for setting this language quality factor for
! variant with no language to a very low value is to allow
! for a default variant which can be supplied if none of the
! other variants match the browser's language preferences.
!
! For example, consider the situation with three variants:
!
! <ul>
! <li>foo.en.html, language en
! <li>foo.fr.html, language en
! <li>foo.html, no language
! </ul>
!
! The meaning of a variant with no language is that it is
! always acceptable to the browser. If the request Accept-Language
! header includes either en or fr (or both) one of foo.en.html
! or foo.fr.html will be returned. If the browser does not list
! either en or fr as acceptable, foo.html will be returned instead.
!
! <h2>Note on Caching</h2>
!
! When a cache stores a document, it associates it with the request URL.
! The next time that URL is requested, the cache can use the stored
! document, provided it is still within date. But if the resource is
! subject to content negotiation at the server, this would result in
! only the first requested variant being cached, and subsequent cache
! hits could return the wrong response. To prevent this, by default
! Apache marks all response that are returned after content negotiation
! as non-cacheable. Unfortunately, this can increase network traffic by
! requiring the resouce to be obtained from the original server evry
! time. The HTTP/1.1 protocol includes features to make this much more
! efficient, by allowing cacheing. <p>
!
! For requrests which come from a HTTP/1.0 compliant client (either a
! browser or a cache), the directive <tt>CacheNegotiatedDocs</tt> can be
! used to allow caching of responses which were subject to negotiation.
! This directive can be given in the server config or virtual host, and
! takes no arguments. It has no effect on requests from HTTP/1.1
! clients.
<!--#include virtual="footer.html" -->
! </BODY>
! </HTML>