You are viewing a plain text version of this content. The canonical link for it is here.
Posted to cvs@httpd.apache.org by Paul Sutton <pc...@hyperreal.com> on 1996/12/20 17:13:16 UTC

cvs commit: apache/htdocs/manual content-negotiation.html

pcs         96/12/20 08:13:15

  Modified:    htdocs/manual  content-negotiation.html
  Log:
  Expand documentation of content negotiation for Apache 1.2 including
  HTTP/1.1 stuff. Document the algorithm apache uses to choose a variant.
  
  Revision  Changes    Path
  1.5       +304 -97   apache/htdocs/manual/content-negotiation.html
  
  Index: content-negotiation.html
  ===================================================================
  RCS file: /export/home/cvs/apache/htdocs/manual/content-negotiation.html,v
  retrieving revision 1.4
  retrieving revision 1.5
  diff -C3 -r1.4 -r1.5
  *** content-negotiation.html	1996/12/12 01:09:38	1.4
  --- content-negotiation.html	1996/12/20 16:13:14	1.5
  ***************
  *** 1,57 ****
  ! <html>
  ! <head>
  ! <title>Apache server Content arbitration: MultiViews and *.var files</title>
  ! </head>
    
  ! <body>
    <!--#include virtual="header.html" -->
  ! <h1>Content Arbitration:  MultiViews and *.var files</h1>
    
  ! The HTTP standard allows clients (i.e., browsers like Mosaic or
  ! Netscape) to specify what data formats they are prepared to accept.
  ! The intention is that when information is available in multiple
  ! variants (e.g., in different data formats), servers can use this
  ! information to decide which variant to send.  This feature has been
  ! supported in the CERN server for a while, and while it is not yet
  ! supported in the NCSA server, it is likely to assume a new importance
  ! in light of the emergence of HTML3 capable browsers. <p>
  ! 
  ! The Apache module <A HREF="mod/mod_negotiation.html">mod_negotiation</A> handles
  ! content negotiation in two different ways; special treatment for the
  ! pseudo-mime-type <code>application/x-type-map</code>, and the
  ! MultiViews per-directory Option (which can be set in srm.conf, or in
  ! .htaccess files, as usual).  These features are alternate user
  ! interfaces to what amounts to the same piece of code (in the new file
  ! <code>http_mime_db.c</code>) which implements the content negotiation
  ! portion of the HTTP protocol. <p>
  ! 
  ! Each of these features allows one of several files to satisfy a
  ! request, based on what the client says it's willing to accept; the
  ! differences are in the way the files are identified:
    
    <ul>
  !   <li> A type map (i.e., a <code>*.var</code> file) names the files
  !        containing the variants explicitly
  !   <li> In a MultiViews search, the server does an implicit filename
  !        pattern match, and chooses from among the results.
    </ul>
    
  ! Apache also supports a new pseudo-MIME type,
  ! text/x-server-parsed-html3, which is treated as text/html;level=3
  ! for purposes of content negotiation, and as server-side-included HTML
  ! elsewhere. 
  ! 
  ! <h3>Type maps (*.var files)</h3>
  ! 
  ! A type map is a document which is typed by the server (using its
  ! normal suffix-based mechanisms) as
  ! <code>application/x-type-map</code>.  Note that to use this feature,
  ! you've got to have an <code>AddType</code> some place which defines a
  ! file suffix as <code>application/x-type-map</code>; the easiest thing
  ! may be to stick a
    <pre>
    
  !   AddType application/x-type-map var
    
    </pre>
    in <code>srm.conf</code>.  See comments in the sample config files for
  --- 1,96 ----
  ! <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
  ! <HTML>
  ! <HEAD>
  ! <TITLE>Apache Content Negotiation</TITLE>
  ! </HEAD>
    
  ! <BODY>
    <!--#include virtual="header.html" -->
  ! <h1>Content Negotiation</h1>
    
  ! Apache's support for content negotiation has been updated to meet the
  ! HTTP/1.1 specification. It can choose the best representation of a
  ! resource based on the browser-supplied preferences for media type,
  ! languages, character set and encoding.  It is also implements a
  ! couple of features to give more intelligent handling of requests from
  ! browsers which send incomplete negotiation information.  <p>
  ! 
  ! Content negotiation is provided by the 
  ! <a href="mod/mod_negotiation.html">mod_negotiation</a> module,
  ! which is compiled in by default.
  ! 
  ! <hr>
  ! 
  ! <h2>About Content Negotiation</h2>
  ! 
  ! A resource may be available in several different representations. For
  ! example, it might be available in different languages or different
  ! media types, or a combination.  One way of selecting the most
  ! appropriate choice is to give the user an index page, and let them
  ! select. However it is often possible for the server to choose
  ! automatically. This works because browsers can send as part of each
  ! request information about what representations they prefer. For
  ! example, a browser could indicate that it would like to see
  ! information in French, if possible, else English will do. Browsers
  ! indicate their preferences by headers in the request. To request only
  ! French representations, the browser would send
  ! 
  ! <pre>
  !   Accept-Language: fr
  ! </pre>
  ! 
  ! Note that this preference will only be applied when there is a choice
  ! of representations and they vary by language. 
  ! <p>
  ! 
  ! As an example of a more complex request, this browser has been
  ! configured to accept French and English, but prefer French, and to
  ! accept various media types, preferring HTML over plain text or other
  ! text types, and prefering GIF or jpeg over other media types, but also
  ! allowing any other media type as a last resort:
  ! 
  ! <pre>
  !   Accept-Language: fr; q=1.0, en; q=0.5
  !   Accept: text/html; q=1.0, text/*; q=0.8, image/gif; q=0.6,
  !         image/jpeg; q=0.6, image/*; q=0.5, */*; q=0.1
  ! </pre>
  ! 
  ! Apache 1.2 supports 'server driven' content negotiation, as defined in
  ! the HTTP/1.1 specification. It fully supports the Accept,
  ! Accept-Language, Accept-Charset and Accept-Encoding request headers.
  ! <p>
  ! 
  ! The terms used in content negotiation are: a <b>resource</b> is an
  ! item which can be requested of a server, which might be selected as
  ! the result of a content negotiation algorithm. If a resource is
  ! available in several formats, these are called <b>representations</b>
  ! or <b>variants</b>. The ways in which the variants for a particular
  ! resource vary are called the <b>dimensions</b> of negotiation.
  ! 
  ! <h2>Negotiation in Apache</h2>
  ! 
  ! In order to negotiate a resource, the server needs to be given
  ! information about each of the variants. This is done in one of two
  ! ways: 
    
    <ul>
  !   <li> Using a type map (i.e., a <code>*.var</code> file) which
  !        names the files containing the variants explicitly
  !   <li> Or using a 'MultiViews' search, where the server does an implicit 
  !        filename pattern match, and chooses from among the results.
    </ul>
    
  ! <h3>Using a type-map file</h3>
  ! 
  ! A type map is a document which is associated with the handler
  ! named <code>type-map</code> (or, for backwards-compatibility with
  ! older Apache configurations, the mime type
  ! <code>application/x-type-map</code>).  Note that to use this feature,
  ! you've got to have an <code>SetHanlder</code> some place which defines a
  ! file suffix as <code>type-map</code>; this is best done with a
    <pre>
    
  !   AddHandler type-map var
    
    </pre>
    in <code>srm.conf</code>.  See comments in the sample config files for
  ***************
  *** 61,85 ****
    consist of contiguous RFC822-format header lines.  Entries for
    different variants are separated by blank lines.  Blank lines are
    illegal within an entry.  It is conventional to begin a map file with
  ! an entry for the combined entity as a whole, e.g.,
    <pre>
    
  !   URI: foo; vary="type,language"
    
      URI: foo.en.html
  !   Content-type: text/html; level=2
      Content-language: en
    
  !   URI: foo.fr.html
  !   Content-type: text/html; level=2
  !   Content-language: fr
  ! 
    </pre>
  - If the variants have different qualities, that may be indicated by the
  - "qs" parameter, as in this picture (available as jpeg, gif, or ASCII-art):
  - <pre>
    
  !   URI: foo; vary="type,language"
    
      URI: foo.jpeg
      Content-type: image/jpeg; qs=0.8
  --- 100,126 ----
    consist of contiguous RFC822-format header lines.  Entries for
    different variants are separated by blank lines.  Blank lines are
    illegal within an entry.  It is conventional to begin a map file with
  ! an entry for the combined entity as a whole (although this
  ! is not required, and if present will be ignored). An example
  ! map file is:
    <pre>
    
  !   URI: foo
    
      URI: foo.en.html
  !   Content-type: text/html
      Content-language: en
    
  !   URI: foo.fr.de.html
  !   Content-type: text/html; charset=iso-8859-2
  !   Content-language: fr, de
    </pre>
    
  ! If the variants have different source qualities, that may be indicated
  ! by the "qs" parameter to the media type, as in this picture (available
  ! as jpeg, gif, or ASCII-art):
  ! <pre>
  !   URI: foo
    
      URI: foo.jpeg
      Content-type: image/jpeg; qs=0.8
  ***************
  *** 90,96 ****
      URI: foo.txt
      Content-type: text/plain; qs=0.01
    
  ! </pre><p>
    
    The full list of headers recognized is:
    
  --- 131,142 ----
      URI: foo.txt
      Content-type: text/plain; qs=0.01
    
  ! </pre>
  ! <p>
  ! 
  ! qs values can vary between 0.000 and 1.000. Note that any variant with
  ! a qs value of 0.000 will never be chosen. Variants with no 'qs'
  ! parameter value are given a qs factor of 1.0.  <p>
    
    The full list of headers recognized is:
    
  ***************
  *** 103,114 ****
           client would be granted access if they were to be requested
           directly. 
      <dt> <code>Content-type:</code>
  !   <dd> media type --- level may be specified, along with "qs".  These
           are often referred to as MIME types; typical media types are
           <code>image/gif</code>, <code>text/plain</code>, or
           <code>text/html;&nbsp;level=3</code>.
      <dt> <code>Content-language:</code>
  !   <dd> The language of the variant, specified as an Internet standard
           language code (e.g., <code>en</code> for English,
           <code>kr</code> for Korean, etc.).
      <dt> <code>Content-encoding:</code>
  --- 149,160 ----
           client would be granted access if they were to be requested
           directly. 
      <dt> <code>Content-type:</code>
  !   <dd> media type --- charset, level and "qs" parameters may be given.  These
           are often referred to as MIME types; typical media types are
           <code>image/gif</code>, <code>text/plain</code>, or
           <code>text/html;&nbsp;level=3</code>.
      <dt> <code>Content-language:</code>
  !   <dd> The languages of the variant, specified as an internet standard
           language code (e.g., <code>en</code> for English,
           <code>kr</code> for Korean, etc.).
      <dt> <code>Content-encoding:</code>
  ***************
  *** 139,145 ****
    The effect of <code>MultiViews</code> is as follows: if the server
    receives a request for <code>/some/dir/foo</code>, if
    <code>/some/dir</code> has <code>MultiViews</code> enabled, and
  ! <code>/some/dir/foo</code> does *not* exist, then the server reads the
    directory looking for files named foo.*, and effectively fakes up a
    type map which names all those files, assigning them the same media
    types and content-encodings it would have if the client had asked for
  --- 185,191 ----
    The effect of <code>MultiViews</code> is as follows: if the server
    receives a request for <code>/some/dir/foo</code>, if
    <code>/some/dir</code> has <code>MultiViews</code> enabled, and
  ! <code>/some/dir/foo</code> does <em>not</em> exist, then the server reads the
    directory looking for files named foo.*, and effectively fakes up a
    type map which names all those files, assigning them the same media
    types and content-encodings it would have if the client had asked for
  ***************
  *** 161,213 ****
    
    <p>
    
  ! If one of the files found by the globbing is a CGI script, it's not
  ! obvious what should happen.  My code gives that case gets special
  ! treatment --- if the request was a POST, or a GET with QUERY_ARGS or
  ! PATH_INFO, the script is given an extremely high quality rating, and
  ! generally invoked; otherwise it is given an extremely low quality
  ! rating, which generally causes one of the other views (if any) to be
  ! retrieved.  This is the only jiggering of quality ratings done by the
  ! MultiViews code; aside from that, all Qualities in the synthesized
  ! type maps are 1.0.
    
    <p>
    
  ! <B>New as of 0.8:</B> Documents in multiple languages can also be resolved through the use
  ! of the <code>AddLanguage</code> and <code>LanguagePriority</code> 
  ! directives:
    
    <pre>
  ! AddLanguage en .en
  ! AddLanguage fr .fr
  ! AddLanguage de .de
  ! AddLanguage da .da
  ! AddLanguage el .el
  ! AddLanguage it .it
  ! 
  ! # LanguagePriority allows you to give precedence to some languages
  ! # in case of a tie during content negotiation.
  ! # Just list the languages in decreasing order of preference.
    
  ! LanguagePriority en fr de
    </pre>
    
  ! Here, a request for "foo.html" matched against "foo.html.en" and
  ! "foo.html.fr" would return an French document to a browser that
  ! indicated a preference for French, or an English document otherwise.
  ! In fact, a request for "foo" matched against "foo.html.en",
  ! "foo.html.fr", "foo.ps.en", "foo.pdf.de", and "foo.txt.it" would do
  ! just what you expect - treat those suffices as a database and compare
  ! the request to it, returning the best match.  The languages and data
  ! types share the same suffix name space.
    
    <p>
    
  ! Note that this machinery only comes into play if the file which the
  ! user attempted to retrieve does <em>not</em> exist by that name; if it
  ! does, it is simply retrieved as usual.  (So, someone who actually asks
  ! for <code>foo.jpeg</code>, as opposed to <code>foo</code>, never gets
  ! <code>foo.gif</code>).
    
    <!--#include virtual="footer.html" -->
  ! </body> </html>
  --- 207,420 ----
    
    <p>
    
  ! If one of the files found when reading the directive is a CGI script,
  ! it's not obvious what should happen.  The code gives that case
  ! special treatment --- if the request was a POST, or a GET with
  ! QUERY_ARGS or PATH_INFO, the script is given an extremely high quality
  ! rating, and generally invoked; otherwise it is given an extremely low
  ! quality rating, which generally causes one of the other views (if any)
  ! to be retrieved.
  ! 
  ! <h2>The Negotiation Algorithm</h2>
  ! 
  ! After Apache has obtained a list of the variants for a given resource,
  ! either from a type-map file or from the filenames in the directory, it
  ! applies a algorithm to decide on the 'best' variant to return, if
  ! any. To do this it calculates a quality value for each variant in each
  ! of the dimensions of variance. It is not necessary to know any of the
  ! details of how negotaion actually takes place in order to use Apache's
  ! content negotation features. However the rest of this document
  ! explains in detail the algorithm used for those interested.  <p>
  ! 
  ! In some circumstances, Apache can 'fiddle' the quality factor of a
  ! particular dimension to achive a better result. The ways Apache can
  ! fiddle quality factors is explained in more detail below.
  ! 
  ! <h3>Dimensions of Negotation</h3>
  ! 
  ! <table>
  ! <tr><th>Dimension
  ! <th>Notes
  ! <tr><td>Media Type
  ! <td>Browser indicates preferences on Accept: header. Each item
  ! can have an associate quality factor. Variant description can also
  ! have a quality factor.
  ! <tr><td>Language
  ! <td>Browser indicates preferneces on Accept-Language: header. Each
  ! item
  ! can have a quality factor. Variants can be associated with none, one
  ! or more languages.
  ! <tr><td>Encoding
  ! <td>Browser indicates preference with Accept-Encoding: header.
  ! <tr><td>Charset
  ! <td>Browser indicates preference with Accept-Charset: header. Variant
  ! can indicate a charset as a parameter of the media type.
  ! </table>
  ! 
  ! <h3>Apache Negotiation Algorithm</h3>
  ! 
  ! Apache uses an algorithm to select the 'best' variant (if any) to
  ! return to the browser. This algorithm is not configurable. It operates
  ! like this:
  ! <p>
    
  + <ol>
  + <li>
  + Firstly, for each dimension of the negotiation, the appropriate
  + Accept header is checked and a quality assigned to this each
  + variant. If the Accept header for any dimension means that this
  + variant is not acceptable, eliminate it. If no variants remain, go
  + to step 4.
  + 
  + <li>Select the 'best' variant by a process of elimination. Each of
  + the following tests is applied in order. Any variants not selected at
  + each stage are eliminated. After each test, if only one variant
  + remains, it is selected as the best match. If more than one variant
  + remains, move onto the next test.
  + 
  + <ol>
  + <li>Multiply the quality factor from the Accept header with the
  +   quality-of-source factor for this variant's media type, and select
  +   the variants with the highest value
  + 
  + <li>Select the variants with the highest language quality factor
  + 
  + <li>Select the variants with the best language match, using either the
  +   order of languages on the LanguagePriority directive (if present),
  +   else the order of languages on the Accept-Language header.
  + 
  + <li>Select the variants with the highest 'level' media parameter
  +   (used to give the version of text/html media types). 
  + 
  + <li>Select only unencoded variants, if there is a mix of encoded
  +   and non-encoded variants. If either all variants are encoded
  +   or all variants are not encoded, select all.
  + 
  + <li>Select only variants with acceptable charset media parameters,
  +   as given on the Accept-Charset header line. Charset ISO-8859-1
  +   is always acceptable. Variants not associated with a particular
  +   charset are assumed to be in ISO-8859-1.
  + 
  + <li>Select the variants with the smallest content length
  + 
  + <li>Select the first variant of those remaining (this will be either the
  + first listed in the type-map file, or the first read from the directory)
  + and go to stage 3.
  + 
  + </ol>
  + 
  + <li>The algorithm has now select one 'best' variant, so return
  +   it as the response. The HTTP header Vary is set to indicate the
  +   dimensions of negotations (browsers and caches can use this
  +   information when caching the resource). End.
  + 
  + <li>To get here means no variant was selected (because non are acceptable
  +   to the browser. Return a 406 status (meaning "No acceptable representation")
  +   with a response body consisting of an HTML document listing the
  +   available variants. Also set the HTTP Vary header to indicate the
  +   dimensions of variance.
  + 
  + </ol>
  + <h2><a name="better">Fiddling with Quality Values</a></h2>
  + 
  + Apache sometimes changes the quality values from what would be
  + expected by a strict interpretation of the algorithm above. This is to
  + get a netter result from the algorithm for browsers which do not send
  + full or accurate information. Some of the most popular browsers send
  + Accept header information which would otherwise result in the
  + selection of the wrong variant in many cases. If a browser
  + sends full and correct information these fiddles will not
  + be applied.
    <p>
    
  ! <h3>Media Types and Wildcards</h3>
    
  + The Accept: request header indicates preferneces for media types. It
  + can also include 'wildcard' media types, such as "image/*" or "*/*"
  + where the * matches any string. So a request including:
    <pre>
  !   Accept: image/*, */*
  ! </pre>
    
  ! would indicate that any type starting "image/" would be acceptable,
  ! as would any other type (so the first "image/*" is redundant). Some
  ! browsers routinly send wildcards in addition to explicit types they
  ! can handle. For example:
  ! <pre>
  !   Accept: text/html, text/plain, image/gif, image/jpeg, */*
    </pre>
    
  ! The intention of this result is to indicate that the explicitly
  ! listed types are preferred, but if a different representation is
  ! available, that is ok too. However under the basic algoryth, as given
  ! above, the */* wildcard has exactly equal preference to all the other
  ! types, so they are not being preferred. The browser should really have
  ! sent a request with a lower quality (preference) value for *.*, such
  ! as:
  ! <pre>
  !   Accept: text/html, text/plain, image/gif, image/jpeg, */*; q=0.01
  ! </pre>
    
  + The explicit types have no quality factor, so they default to a
  + preference of 1.0 (the highest). The wildcard */* is given
  + a low preference of 0.01, so other types will only be returned if
  + no variant matches an explicitly listed type. 
    <p>
    
  ! If the Accept: header contains <i>no</i> q factors at all, Apache sets
  ! the q value of "*/*", if present, to 0.01 to emulate the desired
  ! behaviour. It also sets the q value of wildcards of the format
  ! "type/*" to 0.02 (so these are preferred over matches against
  ! "*/*". If any media type on the Accept: header contains a q factor,
  ! these special values are <i>not</i> applied, so requests from browsers
  ! which send the correct information to start with work as expected.
  ! 
  ! <h3>Variants with no Language</h3>
  ! 
  ! If some of the variants for a particular resource have a language
  ! attribute, and some do not, those variants with no language
  ! are given a very low language quality factor of 0.001.<p>
  ! 
  ! The reason for setting this language quality factor for
  ! variant with no language to a very low value is to allow
  ! for a default variant which can be supplied if none of the
  ! other variants match the browser's language preferences. 
  ! 
  ! For example, consider the situation with three variants:
  ! 
  ! <ul>
  ! <li>foo.en.html, language en
  ! <li>foo.fr.html, language en
  ! <li>foo.html, no language
  ! </ul>
  ! 
  ! The meaning of a variant with no language is that it is
  ! always acceptable to the browser. If the request Accept-Language
  ! header includes either en or fr (or both) one of foo.en.html
  ! or foo.fr.html will be returned. If the browser does not list
  ! either en or fr as acceptable, foo.html will be returned instead.
  ! 
  ! <h2>Note on Caching</h2>
  ! 
  ! When a cache stores a document, it associates it with the request URL.
  ! The next time that URL is requested, the cache can use the stored
  ! document, provided it is still within date. But if the resource is
  ! subject to content negotiation at the server, this would result in
  ! only the first requested variant being cached, and subsequent cache
  ! hits could return the wrong response. To prevent this, by default
  ! Apache marks all response that are returned after content negotiation
  ! as non-cacheable. Unfortunately, this can increase network traffic by
  ! requiring the resouce to be obtained from the original server evry
  ! time. The HTTP/1.1 protocol includes features to make this much more
  ! efficient, by allowing cacheing.  <p>
  ! 
  ! For requrests which come from a HTTP/1.0 compliant client (either a
  ! browser or a cache), the directive <tt>CacheNegotiatedDocs</tt> can be
  ! used to allow caching of responses which were subject to negotiation.
  ! This directive can be given in the server config or virtual host, and
  ! takes no arguments. It has no effect on requests from HTTP/1.1
  ! clients.
    
    <!--#include virtual="footer.html" -->
  ! </BODY>
  ! </HTML>