You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by "Robert S. Thau" <rs...@ai.mit.edu> on 1995/03/30 20:49:04 UTC

Content negotiation --- preliminary docs.

Here's some stuff I threw together --- it's basically the existing few
pages, HTMLized, and with a few i's dotted and t's crossed for readers
who aren't protocol weenies.  Randy, could you add something like this
to what's on hyperreal?  Thanks.

rst

<html> <head>
<title>Apache server Content arbitration: MultiViews and *.var files</title>
</head>

<body>
<h1>Content Arbitration:  MultiViews and *.var files</h1>

The HTTP standard allows clients (i.e., browsers like Mosaic or
Netscape) to specify what data formats they are prepared to accept.
The intention is that when information is available in multiple
variants (e.g., in different data formats), servers can use this
information to decide which variant to send.  This feature has been
supported in the CERN server for a while, and while it is not yet
supported in the NCSA server, it is likely to assume a new importance
in light of the emergence of HTML3 capable browsers. <p>

Apache handles content negotiation in two different ways; special
treatment for the pseudo-mime-type application/x-type-map, and the
MultiViews per-directory Option (which can be set in srm.conf, or in
.htaccess files, as usual).  These features are alternate user
interfaces to what amounts to the same piece of code (in the new file
<code>http_mime_db.c</code>) which implements the content negotiation
portion of the HTTP protocol. <p>

Each of these features allows one of several files to satisfy a
request, based on what the client says it's willing to accept; the
differences are in the way the files are identified:

<ul>
  <li> A type map (i.e., a <code>*.var</code> file) names the files
       containing the variants explicitly
  <li> In a MultiViews search, the server does an implicit filename
       pattern match, and chooses from among the results.
</ul>

Apache also supports a new pseudo-MIME type,
text/x-server-processed-html3, which is treated as text/html;level=3
for purposes of content negotiation, and as server-side-included HTML
elsewhere. 

<h2>Type maps (*.var files)</h2>

A type map is a document which is typed by the server (using its
normal suffix-based mechanisms) as
<code>application/x-type-map</code>.  Note that to use this feature,
you've got to have an <code>AddType</code> someplace which defines a
file suffix as <code>application/x-type-map</code>; the easiest thing
may be to stick a
<pre>

  AddType application/x-type-map var

</pre>
in <code>srm.conf</code>.  See comments in the sample config files for
details. <p>

Type map files have an entry for each available variant; these entries
consist of contiguous RFC822-format header lines.  Entries for
different variants are separated by blank lines.  Blank lines are
illegal within an entry.  It is conventional to begin a map file with
an entry for the combined entity as a whole, e.g.,
<pre>

  URI: foo; vary="type,language"

  URI: foo.en.html
  Content-type: text/html; level=2
  Content-language: en

  URI: foo.fr.html
  Content-type: text/html; level=2
  Content-language: fr

</pre>
If the variants have different qualities, that may be indicated by the
"qs" parameter, as in this picture (available as jpeg, gif, or ASCII-art):
<pre>

  URI: foo; vary="type,language"

  URI: foo.jpeg
  Content-type: image/jpeg; qs=0.8

  URI: foo.gif
  Content-type: image/gif; qs=0.5

  URI: foo.txt
  Content-type: text/plain; qs=0.01

</pre><p>

The full list of headers recognized is:

<dl>
  <dt> <code>URI:</code>
  <dd> uri of the file containing the variant (of the given media
       type, encoded with the given content encoding).  These are
       interpreted as URLs relative to the map file; they must be on
       the same server (!), and they must refer to files to which the
       client would be granted access if they were to be requsted
       directly. 
  <dt> <code>Content-type:</code>
  <dd> media type --- level may be specified, along with "qs".  These
       are often referred to as MIME types; typical media types are
       <code>image/gif</code>, <code>text/plain</code>, or
       <code>text/html;&nbsp;level=3</code>.
  <dt> <code>Content-language:</code>
  <dd> The language of the variant, specified as an internet standard
       language code (e.g., <code>en</code> for English,
       <code>kr</code> for Korean, etc.).
  <dt> <code>Content-encoding:</code>
  <dd> If the file is compressed, or otherwise encoded, rather than
       containing the actual raw data, this says how that was done.
       For compressed files (the only case where this generally comes
       up), content encoding should be
       <code>x-compress</code>, or <code>gzip</code>, as appropriate.
  <dt> <code>Content-length:</code>
  <dd> The size of the file.  Clients can ask to receive a given media
       type only if the variant isn't too big; specifying a content
       length in the map allows the server to compare against these
       thresholds without checking the actual file.
</dl>

<h2>Multiviews</h2>

This is a per-directory option, meaning it can be set with an
<code>Options</code> directive within a <code>&lt;Directory&gt;</code>
section in <code>access.conf</code>, or (if <code>AllowOverride</code>
is properly set) in <code>.htaccess</code> files.  Note that
<code>Options All</code> does not set <code>MultiViews</code>; you
have to ask for it by name.  (Fixing this is a one-line change to
<code>httpd.h</code>).<p>


The effect of <code>MultiViews</code> is as follows: if the server
receives a request for <code>/some/dir/foo</code>, if
<code>/some/dir</code> has <code>MultiViews</code> enabled, and
<code>/some/dir/foo</code> does *not* exist, then the server reads the
directory looking for files named foo.*, and effectively fakes up a
type map which names all those files, assigning them the same media
types and content-encodings it would have if the client had asked for
one of them by name.  It then chooses the best match to the client's
requirements, and forwards them along.<p>

This applies to searches for the file named by the
<code>DirectoryIndex</code> directive, if the server is trying to
index a directory; if the configuration files specify
<pre>

  DirectoryIndex index

</pre> then the server will arbitrate between <code>index.html</code>
and <code>index.html3</code> if both are present.  If neither are
present, and <code>index.cgi</code> is there, the server will run it.<p>

If one of the files found by the globbing is a CGI script, it's not
obvious what should happen.  My code gives that case gets special
treatment --- if the request was a POST, or a GET with QUERY_ARGS or
PATH_INFO, the script is given an extremely high quality rating, and
generally invoked; otherwise it is given an extremely low quality
rating, which generally causes one of the other views (if any) to be
retrieved.  This is the only jiggering of quality ratings done by the
MultiViews code; aside from that, all Qualities in the synthesized
type maps are 1.0.<p>

Note that this machinery only comes into play if the file which the
user attempted to retrieve does <em>not</em> exist by that name; if it
does, it is simply retrieved as usual.  (So, someone who actually asks
for <code>foo.jpeg</code>, as opposed to <code>foo</code>, never gets
<code>foo.gif</code>).



<hr>
<address></address>
<!-- hhmts start -->
<!-- hhmts end -->
</body> </html>