You are viewing a plain text version of this content. The canonical link for it is here.
Posted to docs@httpd.apache.org by ma...@apache.org on 2001/05/08 13:42:13 UTC
cvs commit: httpd-docs-1.3/htdocs/manual ebcdic.html

martin      01/05/08 04:42:12

  Modified:    htdocs/manual ebcdic.html
  Log:
  One of the requirements of XHTML is lowercase tag names, right?
  
  Revision  Changes    Path
  1.12      +161 -161  httpd-docs-1.3/htdocs/manual/ebcdic.html
  
  Index: ebcdic.html
  ===================================================================
  RCS file: /home/cvs/httpd-docs-1.3/htdocs/manual/ebcdic.html,v
  retrieving revision 1.11
  retrieving revision 1.12
  diff -u -u -r1.11 -r1.12
  --- ebcdic.html	2001/05/08 11:38:35	1.11
  +++ ebcdic.html	2001/05/08 11:42:01	1.12
  @@ -1,53 +1,53 @@
   <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
  -<HTML>
  -<HEAD>
  -<TITLE>The Apache EBCDIC Port</TITLE>
  -</HEAD>
  -
  -<!-- Background white, links blue (unvisited), navy (visited), red (active) -->
  -<BODY
  - BGCOLOR="#FFFFFF"
  - TEXT="#000000"
  - LINK="#0000FF"
  - VLINK="#000080"
  - ALINK="#FF0000"
  +<html>
  +<head>
  +<title>The Apache EBCDIC Port</title>
  +</head>
  +
  +<!-- background white, links blue (unvisited), navy (visited), red (active) -->
  +<body
  + bgcolor="#ffffff"
  + text="#000000"
  + link="#0000ff"
  + vlink="#000080"
  + alink="#ff0000"
   >
   <!--#include virtual="header.html" -->
  -<H1 ALIGN="CENTER">Overview of the Apache EBCDIC Port</H1>
  +<h1 align="center">Overview of the Apache EBCDIC Port</h1>
   
  - <P>
  + <p>
     As of Version 1.3, the Apache HTTP Server
     includes a port to (non-ASCII) mainframe machines which use
  -  the EBCDIC character set as their native codeset.<BR>
  +  the EBCDIC character set as their native codeset.<br>
     (Initially, that support covered only the Fujitsu-Siemens family of
     mainframes running the
  -  <A HREF="http://www.fujitsu-siemens.com/servers/bs2osd/osdbc_us.htm">BS2000/OSD
  -  operating system</A>, a mainframe OS which features a
  +  <a href="http://www.fujitsu-siemens.com/servers/bs2osd/osdbc_us.htm">BS2000/OSD
  +  operating system</a>, a mainframe OS which features a
     SVR4-derived POSIX subsystem. Later, the two IBM mainframe operating
     systems TPF and OS/390 were added).
  - </P>
  + </p>
   
  -<HR>
  +<hr>
   
  -<H2 ALIGN=CENTER><A NAME="ebcdic">EBCDIC-related conversion functions</A></H2>
  +<h2 align=center><a name="ebcdic">EBCDIC-related conversion functions</a></h2>
   
    The EBCDIC related directives 
  - <A HREF="mod/core.html#ebcdicconvert">EBCDICConvert</A>, 
  - <A HREF="mod/core.html#ebcdicconvertbytype">EBCDICConvertByType</A>, and
  - <A HREF="mod/core.html#ebcdickludge">EBCDICKludge</A>
  + <a href="mod/core.html#ebcdicconvert">EBCDICConvert</a>, 
  + <a href="mod/core.html#ebcdicconvertbytype">EBCDICConvertByType</a>, and
  + <a href="mod/core.html#ebcdickludge">EBCDICKludge</a>
    are available
  - <B>only if the platform's character set is EBCDIC</B>
  + <b>only if the platform's character set is EBCDIC</b>
    (This is currently only the case on Fujitsu-Siemens'
    BS2000/OSD and IBM's OS/390 and TPF operating systems). EBCDIC
  - stands for <EM>Extended Binary-Coded-Decimal Interchange Code</EM>
  + stands for <em>Extended Binary-Coded-Decimal Interchange Code</em>
    and is the codeset used on mainframe machines, in contrast to
    ASCII which is ubiquitous on almost all micro computers today.
  - ASCII (or its extension <EM>latin1</EM>) is the basis for the HTTP
  + ASCII (or its extension <em>latin1</em>) is the basis for the HTTP
    transfer protocol, therefore all EBCDIC-based platforms need a
    way to configure the code set conversion rules required between
  - the EBCDIC based mainframe host and the HTTP socket protocol.<BR>
  + the EBCDIC based mainframe host and the HTTP socket protocol.<br>
   
  -<P>
  +<p>
    On an EBCDIC based system, HTML files and other text files are
    usually saved encoded in the native EBCDIC code set, while image
    files and other binary data are stored with identical encoding as
  @@ -56,120 +56,120 @@
    converted to/from ASCII, depending on the transfer direction)
    and binary files (to be delivered unconverted).
    Such a distinction can be made based on the assigned MIME type, or
  - based on the file extension (<EM>i.e.</EM>, files sharing a common file
  + based on the file extension (<em>i.e.</em>, files sharing a common file
    suffix).
  -</P>
  +</p>
   
  -<P>
  +<p>
    By default, the configuration is symmetric for input and output
  - (<EM>i.e.</EM>, when a PUT request is executed for a document which was
  + (<em>i.e.</em>, when a PUT request is executed for a document which was
    returned by a previous GET request, then the resulting uploaded
    copy should be identical to the original file). However, the
    conversion directives allow for specifying different conversions
    for input and output.
  -</P>
  +</p>
   
  -<P>
  +<p>
    The directives <a href="mod/core.html#ebcdicconvert">EBCDICConvert</a> and
    <a href="mod/core.html#ebcdicconvertbytype">EBCDICConvertByType</a> are used to
    assign the conversion setting (On or Off) based on file
    extensions or MIME types. Each configuration setting can be defined
  - for input only (<EM>e.g.</EM>, PUT method), output only (<EM>e.g.</EM>, GET method),
  + for input only (<em>e.g.</em>, PUT method), output only (<em>e.g.</em>, GET method),
    or both input and output. By default, the conversion setting is
    applied for input and output.
  -</P>
  +</p>
   
  -<P>
  +<p>
    Note that after modifying the conversion settings for a group of
    files, it is not sufficient to restart the server. The reason for
    this is the fact that a cached copy of a document (in a browser or
    proxy cache) will not get revalidated by contents, but only by
    date. Since the modification time of the document did not change,
  - browsers will assume they can reuse the cached copy.<BR>
  + browsers will assume they can reuse the cached copy.<br>
    To recover from this situation, you must either clear all cached
    copies (browser and proxy cache!), or update the modification time
  - of the documents (using the <CODE>touch</CODE> command on the server).
  -</P>
  + of the documents (using the <code>touch</code> command on the server).
  +</p>
   
  -<P>
  +<p>
    Note also that server-parsed documents (CGI scripts, .shtml files,
    and other interpreted files like PHP scripts etc.) are not subject to
    any input conversion and must therefore be stored in EBCDIC form
    on the server side.
  -</P>
  +</p>
   
  -<P>
  +<p>
    In absense of any
  - <A HREF="mod/core.html#ebcdicconvertbytype">EBCDICConvertByType</A> directive,
  - and if no matching <A HREF="mod/core.html#ebcdicconvert">EBCDICConvert</A> was
  + <a href="mod/core.html#ebcdicconvertbytype">EBCDICConvertByType</a> directive,
  + and if no matching <a href="mod/core.html#ebcdicconvert">EBCDICConvert</a> was
    found, Apache falls back to an internal heuristic which assumes
    that all documents with MIME types starting with
  - <SAMP>"text/"</SAMP>, <SAMP>"message/"</SAMP> or
  - <SAMP>"multipart/"</SAMP> as well as the MIME type
  - <SAMP>"application/x-www-form-urlencoded"</SAMP> are text documents
  + <samp>"text/"</samp>, <samp>"message/"</samp> or
  + <samp>"multipart/"</samp> as well as the MIME type
  + <samp>"application/x-www-form-urlencoded"</samp> are text documents
    stored in EBCDIC, whereas all other documents are binary files.
  -</P>
  +</p>
   
  -<P>
  +<p>
    In order to provide backward compatibility with older versions of
  - apache, the <A HREF="mod/core.html#ebcdickludge">EBCDICKludge</A> directive
  + apache, the <a href="mod/core.html#ebcdickludge">EBCDICKludge</a> directive
    allows for a less powerful mechanism to control the conversion of
    documents to and from EBCDIC.
  -</P>
  +</p>
   
  -<P>
  - <STRONG>Note</STRONG>:<BLOCKQUOTE>
  +<p>
  + <strong>Note</strong>:<blockquote>
    The EBCDICKludge directive is deprecated, since its functionality
    is superseded by the more powerful
  - <A HREF="mod/core.html#ebcdicconvert">EBCDICConvert</A> and
  - <A HREF="mod/core.html#ebcdicconvertbytype">EBCDICConvertByType</A>
  - directives.</BLOCKQUOTE>
  -</P>
  + <a href="mod/core.html#ebcdicconvert">EBCDICConvert</a> and
  + <a href="mod/core.html#ebcdicconvertbytype">EBCDICConvertByType</a>
  + directives.</blockquote>
  +</p>
   
  -<P>
  +<p>
    The directives are applied in the following order:
  - <OL>
  -  <LI>First, the configured <A HREF="mod/core.html#ebcdicconvert">EBCDICConvert</A>
  + <ol>
  +  <li>First, the configured <a href="mod/core.html#ebcdicconvert">EBCDICConvert</a>
         directives in the current context are evaluated in
         configuration file order. As soon as a matching file extension
         is found, the search stops and the configured conversion is
  -      applied.<BR>
  +      applied.<br>
   
         EBCDICConvert settings inherited from parent directories are
         tested after the more specific (deeper) directory levels.
  -      </LI>
  -  <LI>If the <A HREF="mod/core.html#ebcdickludge">EBCDICKludge</A> is in effect,
  +      </li>
  +  <li>If the <a href="mod/core.html#ebcdickludge">EBCDICKludge</a> is in effect,
         the next step tests for a MIME type of the format
  -      <SAMP><I>type/</I><B>x-ascii-</B><I>subtype</I></SAMP>. If the
  +      <samp><i>type/</i><b>x-ascii-</b><i>subtype</i></samp>. If the
         document has such a type, then the
  -      <SAMP>"<B>x-ascii-</B>"</SAMP> substring is removed and the
  -      conversion set to <SAMP>Off</SAMP>.
  -      </LI>
  -  <LI>In the next step, the configured
  -      <A HREF="mod/core.html#ebcdicconvertbytype">EBCDICConvertByType</A>
  +      <samp>"<b>x-ascii-</b>"</samp> substring is removed and the
  +      conversion set to <samp>Off</samp>.
  +      </li>
  +  <li>In the next step, the configured
  +      <a href="mod/core.html#ebcdicconvertbytype">EBCDICConvertByType</a>
         directives are evaluated in configuration file order. If
         the document has a matching MIME type, the search stops and
  -      the configured conversion is applied.<BR>
  +      the configured conversion is applied.<br>
   
         EBCDICConvertByType settings inherited from parent
         directories are tested after the more specific (deeper)
  -      directory levels.<BR>
  +      directory levels.<br>
   
  -      If no <A HREF="mod/core.html#ebcdicconvertbytype">EBCDICConvertByType</A>
  +      If no <a href="mod/core.html#ebcdicconvertbytype">EBCDICConvertByType</a>
         directive at all exists in the current context, the server
         falls back to the simple heuristics which assume that MIME
         types starting with "text/", "message/" or "multipart/" (plus
         the special type "application/x-www-form-urlencoded" used in
         simple POST requests) imply a conversion, while all the rest
  -      is delivered unconverted (<EM>i.e.</EM>, binary).
  -      </LI>
  - </OL>
  -</P>
  +      is delivered unconverted (<em>i.e.</em>, binary).
  +      </li>
  + </ol>
  +</p>
   
  -<HR>
  +<hr>
   
  - <H2 ALIGN=CENTER><A NAME="tech">Technical Details</A></H2>
  - <P>
  + <h2 align=center><a name="tech">Technical Details</a></h2>
  + <p>
     Since all Apache input and output is based upon the BUFF data type
     and its methods, the easiest solution was to add the actual
     conversion to the BUFF handling routines. The conversion must be
  @@ -178,8 +178,8 @@
     Two such flags exist: one for data read from the client
     (ASCII to EBCDIC conversion) and one for data returned to the
     client (EBCDIC to ASCII conversion).
  - </P>
  - <P>
  + </p>
  + <p>
     During sending of the header, Apache determines (based on the
     returned MIME type for the request) whether conversion should be used
     or the document returned unconverted. It uses this decision to
  @@ -187,92 +187,92 @@
     Modules should therefore determine the MIME type for the
     current request before initiating the response by calling
     ap_send_http_headers().
  - </P>
  - <P>
  + </p>
  + <p>
     The BUFF flag is modified at
     several points in the HTTP protocol:
  - </P>
  + </p>
   
  -  <UL>
  -   <LI><STRONG>set</STRONG> (In and Out) before a request is
  +  <ul>
  +   <li><strong>set</strong> (In and Out) before a request is
          received (because the request and the request header
          lines are always in ASCII format)
   
  -   <LI><STRONG>set/unset</STRONG> (for Input data) when the request body is
  +   <li><strong>set/unset</strong> (for Input data) when the request body is
          received - depending on the content type of the request body
          (because the request body may contain ASCII text or a binary file)
   
  -   <LI><STRONG>set</STRONG> (for returned Output) before a response
  +   <li><strong>set</strong> (for returned Output) before a response
          header is sent (because the response header lines are always
          in ASCII format)
   
  -   <LI><STRONG>set/unset</STRONG> (for returned Output) when the
  +   <li><strong>set/unset</strong> (for returned Output) when the
          response body is sent - depending on the content type of the
          response body (because the response body may contain text or
          a binary file)
  -  </UL>
  +  </ul>
     Additional transparent transitions may occur for extracting/inserting
     the HTTP/1.1 chunking information from/into the input/output body data
  -  stream, and for generating <EM>multipart</EM> headers for <EM>range</EM>
  +  stream, and for generating <em>multipart</em> headers for <em>range</em>
     requests. (See RFC2616 and src/main/http_protocol.c for details.)
   
   
  - <HR>
  - <H2 ALIGN=CENTER><A NAME="port">Porting Notes</A></H2>
  + <hr>
  + <h2 align=center><a name="port">Porting Notes</a></h2>
   
  -  <OL>
  -   <LI>
  +  <ol>
  +   <li>
      The relevant changes in the source are #ifdef'ed into two
      categories:
  -   <DL>
  -    <DT><CODE><STRONG>#ifdef CHARSET_EBCDIC</STRONG></CODE>
  -    <DD>Code which is needed for any EBCDIC based machine. This
  +   <dl>
  +    <dt><code><strong>#ifdef CHARSET_EBCDIC</strong></code>
  +    <dd>Code which is needed for any EBCDIC based machine. This
   	includes character translations, differences in
   	contiguity of the two character sets, flags which
   	indicate which part of the HTTP protocol has to be
  -	converted and which part doesn't <EM>etc.</EM>
  -    <DT><CODE><STRONG>#ifdef _OSD_POSIX | TPF | OS390</STRONG></CODE>
  -    <DD>Code which is needed for the Fujitsu-Siemens BS2000/OSD | IBM TPF |
  +	converted and which part doesn't <em>etc.</em>
  +    <dt><code><strong>#ifdef _OSD_POSIX | TPF | OS390</strong></code>
  +    <dd>Code which is needed for the Fujitsu-Siemens BS2000/OSD | IBM TPF |
           IBM OS390 mainframe platforms only. This deals with include file
   	differences and socket and fork implementation topics which are
   	only required on the respective platform.
  -   <BR>
  -   </DL>
  -   </LI>
  +   <br>
  +   </dl>
  +   </li>
   
  -   <LI>
  +   <li>
       The possibility to translate between ASCII and EBCDIC at the
       socket level (on BS2000 POSIX, there is a socket option which
  -    supports this) was intentionally <EM>not</EM> chosen, because
  +    supports this) was intentionally <em>not</em> chosen, because
       the byte stream at the HTTP protocol level consists of a
       mixture of protocol related strings and non-protocol related
       raw file data. HTTP protocol strings are always encoded in
       ASCII (the GET request, any Header: lines, the chunking
  -    information <EM>etc.</EM>) whereas the file transfer parts (<EM>i.e.</EM>, GIF
  -    images, CGI output <EM>etc.</EM>) should usually be just "passed through"
  +    information <em>etc.</em>) whereas the file transfer parts (<em>i.e.</em>, GIF
  +    images, CGI output <em>etc.</em>) should usually be just "passed through"
       by the server. This separation between "protocol string" and
       "raw data" is reflected in the server code by functions like
       bgets() or rvputs() for strings, and functions like bwrite()
       for binary data. A global translation of everything would
  -    therefore be inadequate.<BR>
  +    therefore be inadequate.<br>
       (In the case of text files of course, provisions must be made so
       that EBCDIC documents are always served in ASCII)
  -   <BR>
  +   <br>
       This port therefore features a built-in protocol level conversion
       for the server-internal strings (which the compiler translated to
       EBCDIC strings) and thus for all server-generated documents.
  -   <BR>
  -   </LI>
  +   <br>
  +   </li>
   
  -   <LI>
  +   <li>
       By examining the call hierarchy for the BUFF management
       routines, I added an "ebcdic/ascii conversion layer" which
       would be crossed on every puts/write/get/gets, and
       conversion flags which allowed enabling/disabling the
       conversions on-the-fly. Usually, a document crosses this
       layer twice from its origin source (a file or CGI output) to
  -    its destination (the requesting client): <SAMP>file -&gt;
  -    Apache</SAMP>, and <SAMP>Apache -&gt; client</SAMP>.<BR>
  +    its destination (the requesting client): <samp>file -&gt;
  +    Apache</samp>, and <samp>Apache -&gt; client</samp>.<br>
       The server can now read the header
       lines of a CGI-script output in EBCDIC format, and then find
       out that the remainder of the script's output is in ASCII
  @@ -282,80 +282,80 @@
       based on the type of document being served, whether the
       document body (except for the chunking information, of
       course) is in ASCII already or must be converted from EBCDIC.
  -   <BR>
  -   </LI>
  +   <br>
  +   </li>
   
  -   <LI>
  +   <li>
       By default, Apache assumes that documents with the MIME types
       "text/*", "message/*", "multipart/*" and "application/x-www-form-urlencoded"
       are text documents and are stored as EBCDIC files, whereas all
       other files are binary files (and stored in a byte-identical
  -    encoding as on an ASCII machine).<BR>
  +    encoding as on an ASCII machine).<br>
       These defaults can be overridden
  -    on a <A HREF="mod/core.html#ebcdicconvertbytype">by-MIME-type</A> and/or
  -    <A HREF="mod/core.html#ebcdicconvert">by-file-extension</A> basis, using the
  -    directives<PRE>
  -     <A HREF="mod/core.html#ebcdicconvertbytype">EBCDICConvertByType</A> {On|Off}[={In|Out|InOut}] <EM>mimetype</EM> [...]
  -     <A HREF="mod/core.html#ebcdicconvert">EBCDICConvert</A>       {On|Off}[={In|Out|InOut}] <EM>fileext</EM> [...]
  -    </PRE> where the <EM>mimetype</EM> argument may contain wildcards.
  -   <BR>
  -   </LI>
  +    on a <a href="mod/core.html#ebcdicconvertbytype">by-MIME-type</a> and/or
  +    <a href="mod/core.html#ebcdicconvert">by-file-extension</a> basis, using the
  +    directives<pre>
  +     <a href="mod/core.html#ebcdicconvertbytype">EBCDICConvertByType</a> {On|Off}[={In|Out|InOut}] <em>mimetype</em> [...]
  +     <a href="mod/core.html#ebcdicconvert">EBCDICConvert</a>       {On|Off}[={In|Out|InOut}] <em>fileext</em> [...]
  +    </pre> where the <em>mimetype</em> argument may contain wildcards.
  +   <br>
  +   </li>
   
  -   <LI>
  +   <li>
       Before adding the flexible conversion, non-text documents were
       always served "binary" without conversion.
  -    This seemed to be the most sensible choice for, .<EM>e.g.</EM>, GIF/ZIP/AU
  +    This seemed to be the most sensible choice for, .<em>e.g.</em>, GIF/ZIP/AU
       file types (It of course requires the user to copy them to the
       mainframe host using the "rcp -b" binary switch), but proved to be
  -    inadequate for MIME types like <SAMP>model/vrml</SAMP>,
  -    <SAMP>application/postscript</SAMP> and <SAMP>application/x-javascript</SAMP>.
  -   <BR>
  -   </LI>
  +    inadequate for MIME types like <samp>model/vrml</samp>,
  +    <samp>application/postscript</samp> and <samp>application/x-javascript</samp>.
  +   <br>
  +   </li>
   
  -   <LI>
  -    Server parsed files are always assumed to be in native (<EM>i.e.</EM>,
  +   <li>
  +    Server parsed files are always assumed to be in native (<em>i.e.</em>,
       EBCDIC) format as used on the machine (because they do not cross the
       conversion layer when being read), and are converted after processing.
  -   <BR>
  -   </LI>
  +   <br>
  +   </li>
   
  -   <LI>
  +   <li>
       For CGI output, the CGI script determines whether a conversion is
       needed or not: by setting the appropriate Content-Type, text files
       can be converted, or GIF output can be passed through unmodified
       (depending on the conversion configured in the script's context).
  -   <BR>
  -   </LI>
  -  </OL>
  -
  - <HR>
  -
  - <H2 ALIGN=CENTER><A NAME="store">Document Storage Notes</A></H2>
  -  <H3 ALIGN=CENTER>Binary Files</H3>
  -   <P>
  +   <br>
  +   </li>
  +  </ol>
  +
  + <hr>
  +
  + <h2 align=center><a name="store">Document Storage Notes</a></h2>
  +  <h3 align=center>Binary Files</h3>
  +   <p>
       When exchanging binary files between the mainframe host and a
       Unix machine or Windows PC, be sure to use the ftp "binary"
  -    (<SAMP>TYPE I</SAMP>) command, or use the
  -    <SAMP>rcp&nbsp;-b</SAMP> command from the mainframe host
  +    (<samp>TYPE I</samp>) command, or use the
  +    <samp>rcp&nbsp;-b</samp> command from the mainframe host
       (the -b switch is not supported in unix rcp's).
  -   </P>
  +   </p>
   
  -  <H3 ALIGN=CENTER>Text Documents</H3>
  -   <P>
  +  <h3 align=center>Text Documents</h3>
  +   <p>
       The default assumption of the server is that Text Files
  -    (<EM>i.e.</EM>, all files whose <SAMP>Content-Type:</SAMP> starts with
  -    <SAMP>text/</SAMP>) are stored in the native character
  +    (<em>i.e.</em>, all files whose <samp>Content-Type:</samp> starts with
  +    <samp>text/</samp>) are stored in the native character
       set of the host, EBCDIC.
  -   </P>
  +   </p>
   
  -  <H3 ALIGN=CENTER>Server Side Included Documents</H3>
  -   <P>
  +  <h3 align=center>Server Side Included Documents</h3>
  +   <p>
       SSI documents must currently be stored in EBCDIC only. No
       provision is made to convert them from ASCII before processing.
       The same holds for other interpreted languages, like
       mod_perl or mod_php.
  -   </P>
  +   </p>
   
   <!--#include virtual="footer.html" -->
  -</BODY>
  -</HTML>
  +</body>
  +</html>
  
  
  

---------------------------------------------------------------------
To unsubscribe, e-mail: apache-docs-unsubscribe@apache.org
For additional commands, e-mail: apache-docs-help@apache.org