You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@xml.apache.org by Syd Bauman <Sy...@Brown.edu> on 2005/01/14 07:53:05 UTC

crimson (and thus jing) barfs on xml:lang="{ISO 639-2}"

My apologies if this should have been sent elsewhere; I did not find
any specific place to report bugs in either jing or crimson. Feel
free to say "not our problem, send it to XYZ".

When jing comes across
   <foreign xml:lang="spa">este</foreign>
it reports
   error: Illegal xml:lang value "spa".
However, "spa" is the correct ISO 639-2 (3-letter) code for Spanish.
While one could argue that because "es" (the ISO 639-1 (2-letter)
code for Spanish) is preferred, the error message is overly dramatic,
but essentially correct. However, it also flags xml:lang="grc" as an
error, and there is no 2-letter code for ancient Greek.

ISO 639-2 (3-letter) codes are permitted by RFC 3066[1], which is how
values of xml:lang= are supposed to be interpreted[2].

Jing is relying on a modified version of crimson's parser 1.16. I do
not speak Java, but it seems to me the problem is in the file
Parser2.java in the definition of isXmlLang(). The commented
production is
        // [35] ISO639Code ::= [a-zA-Z] [a-zA-Z]
where (I think) it should be
        // [35] ISO639Code ::= [a-zA-Z] [a-zA-Z] [a-zA-Z]?
with code to match.[3]

I have checked both the version of Parser2.java that ships with jing
version 20030619 and the version of Parser2.java in the CVS tree as
of tonight, and they seem the same as far as this problem is
concerned. I have searched the archives of this list, and did not see
anything relevant.


Notes
-----
[1] See http://www.ietf.org/rfc/rfc3066.txt section 2.2 (search for
    1st occurrence of "3-letter").
[2] See http://www.w3.org/TR/2004/REC-xml-20040204/#sec-lang-tag.
[3] Of course this isn't really an ideal solution, as it permits all
    17,576 combinations of 3 letters, whereas < 500 (i.e., < 3%) of
    them are valid codes per 639-2.

-- 
 Syd Bauman, EMT-Paramedic
 SGML & XML Programmer/Analyst              North American Editor
 Brown University Women Writers Project     Text Encoding Initiative
 Syd_Bauman@Brown.edu      401-863-3835     http://www.tei-c.org/


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org