You are viewing a plain text version of this content. The canonical link for it is here.
Posted to svn@forrest.apache.org by cr...@apache.org on 2004/10/26 09:31:51 UTC
svn commit: rev 55588 - forrest/trunk/src/documentation/content/xdocs
Author: crossley
Date: Tue Oct 26 00:31:51 2004
New Revision: 55588
Modified:
forrest/trunk/src/documentation/content/xdocs/faq.xml
Log:
Tidy the FAQ for "encoding". Fix spelling errors and grammar.
Move it to the Technical section.
Add discussion of character entity references.
Link to some useful resources.
Modified: forrest/trunk/src/documentation/content/xdocs/faq.xml
==============================================================================
--- forrest/trunk/src/documentation/content/xdocs/faq.xml (original)
+++ forrest/trunk/src/documentation/content/xdocs/faq.xml Tue Oct 26 00:31:51 2004
@@ -122,57 +122,6 @@
</p>
</answer>
</faq>
- <faq id="encoding">
- <question>Does forrest like accents?</question>
- <answer>
- <p>Short answer: yes, forrest can process text in any language so you can include:</p>
- <dl>
- <dt>accents</dt><dd>áéíóú</dd>
- <dt>dieresis</dt><dd>äëïöü</dd>
- <dt>tildes</dt><dd>ãñĩõũ</dd>
- <dt>Everything that has a computer character representation</dt><dd><!-- include other common non-ASCII characters--></dd>
- </dl>
- <p>This is because sources for forrest docs are xml documents, which can include any of these
- as long as the encoding your xml doc declares matches the actual encoding used in the file containing
- the xml. For instance if you declare:</p>
- <source>
- <![CDATA[
-<?xml version="1.0" encoding="UTF-8"?>
- ]]>
- </source>
- <p>but the file is actually using ISO-8859-1 you will probably get a validation error. specially if
- you include some non-ASCII characters.</p>
- <p>
- This situation is commonly encountered when you edit the templates created by <code>forrest seed</code> with your favorite
- (probably localized) editor without paying attention to the encoding. Or when you create a new file
- and simply copy the headers from another file
- </p>
- <p>Though UTF-8 is an encoding well suited for most languages is not ussually the default
- in popular editors or systems.</p>
- <p>In UNIX-like systems, most popular editors can handle different encodings to
- write the file in disk. On some editors the encoding of the file is preserved, in others the default
- is used regardless the original encoding. On most cases the encoding used to write files
- can be controled by setting the enviroment variable <code>LANG</code>
- to an appropiate value, for instance:
- </p>
- <source>
- <![CDATA[
-$ export LANG=en_US.UTF-8
- ]]>
- </source>
- <p>Of course the <em>appropiate</em> way of setting the encoding to use depends on the editor/OS,
- but ultimately relays on the user preferences. So you can use the encoding you prefer as long as
- the <code>encoding</code> attribute of the xml declaration matches the actual encoding of the file.
- This means
- that if you are not willing to abandon ISO-8859-1 you can always use the following declaration instead:</p>
- <source>
- <![CDATA[
-<?xml version="1.0" encoding="ISO-8859-1"?>
- ]]>
- </source>
-
- </answer>
- </faq>
</part>
<part id="technical">
@@ -459,6 +408,58 @@
<link href="http://www.w3.org/TR/xhtml-modularization/dtd_module_defs.html#a_xhtml_character_entities">XHTML Character Entities</link>
and see more discussion at
<link href="http://issues.cocoondev.org/browse/FOR-244">Issue FOR-244</link>.
+ </p>
+ </answer>
+ </faq>
+
+ <faq id="encoding">
+ <question>Does Forrest handle accents for non-English languages?</question>
+ <answer>
+ <p>Yes, Forrest can process text in any language, so you can include:</p>
+ <ul>
+ <li>accents: á é í ó ú</li>
+ <li>diereses: ä ë ï ö ü</li>
+ <li>tildes: ã ñ ĩ õ ũ</li>
+ </ul>
+ <p>This is because sources for Forrest docs are xml documents, which can include any of these,
+ provided the encoding declared by the xml doc matches the actual encoding used in the file.
+ For example if you declare the default encoding:</p>
+ <source><![CDATA[<?xml version="1.0" encoding="UTF-8"?>]]></source>
+ <p>but the file content is actually using ISO-8859-1 then you will
+ receive validation errors, especially if
+ you include some non-ASCII characters.</p>
+ <p>
+ This situation is commonly encountered when you edit the templates
+ created by <code>forrest seed</code> with your favorite
+ (probably localized) editor without paying attention to the encoding,
+ or when you create a new file
+ and simply copy the headers from another file.
+ </p>
+ <p>Although UTF-8 is an encoding well-suited for most languages,
+ it is not usually the default in popular editors or systems.
+ In UNIX-like systems, most popular editors can handle different encodings to
+ write the file in disk. With some editors the encoding of the file is preserved, while with others the default
+ is used regardless of the original encoding. In most cases the encoding used to write files
+ can be controlled by setting the environment variable <code>LANG</code>
+ to an appropriate value, for instance:
+ </p>
+ <source>[localhost]$ export LANG=en_US.UTF-8</source>
+ <p>Of course the <em>appropriate</em> way to set the encoding depends on the editor/OS,
+ but ultimately relys on the user preferences. So you can use the encoding you prefer, as long as
+ the <code>encoding</code> attribute of the xml declaration matches the actual encoding of the file.
+ This means
+ that if you are not willing to abandon ISO-8859-1 you can always use the following declaration instead:</p>
+ <source><![CDATA[<?xml version="1.0" encoding="ISO-8859-1"?>]]></source>
+ <p>Another option is to use "character entities" such as
+ <code><![CDATA[ö]]></code> (ö) or the numeric form
+ <code><![CDATA[ö]]></code> (ö).
+ </p>
+ <p>Another related issue is that your webserver needs to send http
+ headers with the matching charset definitions to the html page.
+ </p>
+ <p>Here are some references which explain further:
+ <link href="http://orixo.com/events/gt2004/bios.html#torsten">GT2004 presentation by Torsten Schlabach</link> and
+ <link href="http://www.alanwood.net/unicode/">Alan Wood's Unicode resources</link>.
</p>
</answer>
</faq>