You are viewing a plain text version of this content. The canonical link for it is here.
Posted to svn@forrest.apache.org by cr...@apache.org on 2004/10/26 09:31:51 UTC

svn commit: rev 55588 - forrest/trunk/src/documentation/content/xdocs

Author: crossley
Date: Tue Oct 26 00:31:51 2004
New Revision: 55588

Modified:
   forrest/trunk/src/documentation/content/xdocs/faq.xml
Log:
Tidy the FAQ for "encoding". Fix spelling errors and grammar.
Move it to the Technical section.
Add discussion of character entity references.
Link to some useful resources.


Modified: forrest/trunk/src/documentation/content/xdocs/faq.xml
==============================================================================
--- forrest/trunk/src/documentation/content/xdocs/faq.xml	(original)
+++ forrest/trunk/src/documentation/content/xdocs/faq.xml	Tue Oct 26 00:31:51 2004
@@ -122,57 +122,6 @@
         </p>
       </answer>
     </faq>
-    <faq id="encoding">
-        <question>Does forrest like accents?</question>
-        <answer>
-            <p>Short answer: yes, forrest can process text in any language so you can include:</p>
-            <dl>
-            <dt>accents</dt><dd>áéíóú</dd>
-            <dt>dieresis</dt><dd>äëïöü</dd>
-            <dt>tildes</dt><dd>ãñĩõũ</dd>
-            <dt>Everything that has a computer character representation</dt><dd><!-- include other common non-ASCII characters--></dd>
-            </dl>
-            <p>This is because sources for forrest docs are xml documents, which can include any of these
-            as long as the encoding your xml doc declares matches the actual encoding used in the file containing
-            the xml. For instance if you declare:</p>
-            <source>
-            <![CDATA[
-<?xml version="1.0" encoding="UTF-8"?>
-            ]]>
-            </source>
-            <p>but the file is actually using ISO-8859-1 you will probably get a validation error. specially if 
-            you include some non-ASCII characters.</p>
-            <p>
-            This situation is commonly encountered when you edit the templates created by <code>forrest seed</code> with your favorite 
-            (probably localized) editor without paying attention to the encoding. Or when you create a new file 
-            and simply copy the headers from another file
-            </p>
-            <p>Though UTF-8 is an encoding well suited for most languages is not ussually the default
-            in popular editors or systems.</p>
-            <p>In UNIX-like systems, most popular editors can handle different encodings to 
-            write the file in disk. On some editors the encoding of the file is preserved, in others the default
-            is used regardless the original encoding. On most cases the encoding used to write files
-            can be controled by setting the enviroment variable <code>LANG</code>
-            to an appropiate value, for instance:
-            </p>
-            <source>
-            <![CDATA[
-$ export LANG=en_US.UTF-8
-            ]]>
-            </source>
-            <p>Of course the <em>appropiate</em> way of setting the encoding to use depends on the editor/OS, 
-            but ultimately relays on the user preferences. So you can use the encoding you prefer as long as 
-            the <code>encoding</code> attribute of the xml declaration matches the actual encoding of the file. 
-            This means 
-            that if you are not willing to abandon ISO-8859-1 you can always use the following declaration instead:</p>
-            <source>
-            <![CDATA[
-<?xml version="1.0" encoding="ISO-8859-1"?>
-            ]]>
-            </source>
-            
-        </answer>
-    </faq>
   </part>
 
   <part id="technical">
@@ -459,6 +408,58 @@
         <link href="http://www.w3.org/TR/xhtml-modularization/dtd_module_defs.html#a_xhtml_character_entities">XHTML Character Entities</link>
         and see more discussion at
         <link href="http://issues.cocoondev.org/browse/FOR-244">Issue FOR-244</link>.
+        </p>
+      </answer>
+    </faq>
+
+    <faq id="encoding">
+      <question>Does Forrest handle accents for non-English languages?</question>
+      <answer>
+        <p>Yes, Forrest can process text in any language, so you can include:</p>
+        <ul>
+          <li>accents: á é í ó ú</li>
+          <li>diereses: ä ë ï ö ü</li>
+          <li>tildes: ã ñ ĩ õ ũ</li>
+        </ul>
+        <p>This is because sources for Forrest docs are xml documents, which can include any of these,
+          provided the encoding declared by the xml doc matches the actual encoding used in the file.
+          For example if you declare the default encoding:</p>
+        <source><![CDATA[<?xml version="1.0" encoding="UTF-8"?>]]></source>
+        <p>but the file content is actually using ISO-8859-1 then you will
+           receive validation errors, especially if 
+           you include some non-ASCII characters.</p>
+        <p>
+          This situation is commonly encountered when you edit the templates
+          created by <code>forrest seed</code> with your favorite 
+          (probably localized) editor without paying attention to the encoding,
+          or when you create a new file 
+          and simply copy the headers from another file.
+        </p>
+        <p>Although UTF-8 is an encoding well-suited for most languages,
+          it is not usually the default in popular editors or systems.
+          In UNIX-like systems, most popular editors can handle different encodings to 
+          write the file in disk. With some editors the encoding of the file is preserved, while with others the default
+          is used regardless of the original encoding. In most cases the encoding used to write files
+          can be controlled by setting the environment variable <code>LANG</code>
+          to an appropriate value, for instance:
+        </p>
+        <source>[localhost]$ export LANG=en_US.UTF-8</source>
+        <p>Of course the <em>appropriate</em> way to set the encoding depends on the editor/OS, 
+          but ultimately relys on the user preferences. So you can use the encoding you prefer, as long as 
+          the <code>encoding</code> attribute of the xml declaration matches the actual encoding of the file. 
+          This means 
+          that if you are not willing to abandon ISO-8859-1 you can always use the following declaration instead:</p>
+        <source><![CDATA[<?xml version="1.0" encoding="ISO-8859-1"?>]]></source>
+        <p>Another option is to use "character entities" such as
+        <code><![CDATA[&ouml;]]></code> (&ouml;) or the numeric form
+        <code><![CDATA[&#246;]]></code> (&#246;).
+        </p>
+        <p>Another related issue is that your webserver needs to send http
+          headers with the matching charset definitions to the html page.
+        </p>
+        <p>Here are some references which explain further:
+          <link href="http://orixo.com/events/gt2004/bios.html#torsten">GT2004 presentation by Torsten Schlabach</link> and 
+          <link href="http://www.alanwood.net/unicode/">Alan Wood's Unicode resources</link>.
         </p>
       </answer>
     </faq>