You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by "Michael Glavassevich (JIRA)" <xe...@xml.apache.org> on 2005/02/10 06:43:12 UTC

[jira] Resolved: (XERCESJ-1041) Xerces C++ defines an encoding-string that Xerces/Java refuses to parse

     [ http://issues.apache.org/jira/browse/XERCESJ-1041?page=history ]
     
Michael Glavassevich resolved XERCESJ-1041:
-------------------------------------------

    Resolution: Won't Fix

The encoding names which Xerces-J recognizes is restricted to those registered with IANA [1]. 

Name: ISO_8859-1:1987                                    [RFC1345,KXS2]
MIBenum: 4
Source: ECMA registry
Alias: iso-ir-100
Alias: ISO_8859-1
Alias: ISO-8859-1 (preferred MIME name)
Alias: latin1
Alias: l1
Alias: IBM819
Alias: CP819
Alias: csISOLatin1

Above are the aliases registered for ISO-8859-1. Xerces-J recognizes all of them. Note that ISO8859-1 is not in this list. I believe the XML spec recommends the usage of IANA names to increase the portability of XML documents across parser implementations. Supporting unregistered encoding names harms document portability. The problem you've run into demonstrates that. There are many other parsers out there which won't have any idea what encoding "ISO8859-1" is since it isn't registered so you still have an interoperability problem.

[1] http://www.iana.org/assignments/character-sets

> Xerces C++ defines an encoding-string that Xerces/Java refuses to parse
> -----------------------------------------------------------------------
>
>          Key: XERCESJ-1041
>          URL: http://issues.apache.org/jira/browse/XERCESJ-1041
>      Project: Xerces2-J
>         Type: Bug
>     Versions: 2.4.0
>  Environment: XercesC-2.3, XalanJ 2.4, Solaris 6
>     Reporter: Dominik Stadler

>
> We are using Xerces C++ to create XML-Messages that are later parsed by Xerces/Java.
> XercesC provides a define XMLUni::fgISO88591EncodingString for setting the encoding, the XML-Message contains the string "ISO8859-1" as encoding.
> When we later use Xerces/Java to parse this file, we get the following error:
> [Fatal Error] :1:43: Invalid encoding name "ISO8859-1".
> It seems that Xerces/Java only knows the encoding "ISO-8859-1" (with a dash), but not "ISO8859-1" (without dash).
> The XML-Specification states that "ISO-8859-1" (with a dash) SHOULD be used, look at http://www.w3.org/TR/2004/REC-xml-20040204/#charencoding
> So in my opinion either Xerces C++ should not provide that define any more, or Xerces/Java should be enhanced to accept that encoding-string. Otherwise XercesC and XercesJ differ in this part, where we until now thought they would be equal in their parsing-behavior.
> I already report a Bug at http://issues.apache.org/jira/browse/XERCESC-1336 that reports this for XercesC.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org