You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@xml.apache.org by Syd Bauman <Sy...@Brown.edu> on 2005/01/14 07:53:05 UTC
crimson (and thus jing) barfs on xml:lang="{ISO 639-2}"
My apologies if this should have been sent elsewhere; I did not find
any specific place to report bugs in either jing or crimson. Feel
free to say "not our problem, send it to XYZ".
When jing comes across
<foreign xml:lang="spa">este</foreign>
it reports
error: Illegal xml:lang value "spa".
However, "spa" is the correct ISO 639-2 (3-letter) code for Spanish.
While one could argue that because "es" (the ISO 639-1 (2-letter)
code for Spanish) is preferred, the error message is overly dramatic,
but essentially correct. However, it also flags xml:lang="grc" as an
error, and there is no 2-letter code for ancient Greek.
ISO 639-2 (3-letter) codes are permitted by RFC 3066[1], which is how
values of xml:lang= are supposed to be interpreted[2].
Jing is relying on a modified version of crimson's parser 1.16. I do
not speak Java, but it seems to me the problem is in the file
Parser2.java in the definition of isXmlLang(). The commented
production is
// [35] ISO639Code ::= [a-zA-Z] [a-zA-Z]
where (I think) it should be
// [35] ISO639Code ::= [a-zA-Z] [a-zA-Z] [a-zA-Z]?
with code to match.[3]
I have checked both the version of Parser2.java that ships with jing
version 20030619 and the version of Parser2.java in the CVS tree as
of tonight, and they seem the same as far as this problem is
concerned. I have searched the archives of this list, and did not see
anything relevant.
Notes
-----
[1] See http://www.ietf.org/rfc/rfc3066.txt section 2.2 (search for
1st occurrence of "3-letter").
[2] See http://www.w3.org/TR/2004/REC-xml-20040204/#sec-lang-tag.
[3] Of course this isn't really an ideal solution, as it permits all
17,576 combinations of 3 letters, whereas < 500 (i.e., < 3%) of
them are valid codes per 639-2.
--
Syd Bauman, EMT-Paramedic
SGML & XML Programmer/Analyst North American Editor
Brown University Women Writers Project Text Encoding Initiative
Syd_Bauman@Brown.edu 401-863-3835 http://www.tei-c.org/
---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org