You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@xalan.apache.org by "Simon Schaarschmidt (JIRA)" <ji...@apache.org> on 2018/09/24 14:15:00 UTC

[jira] [Updated] (XALANJ-2618) Error in org/apache/xml/serializer/Encodings.properties

     [ https://issues.apache.org/jira/browse/XALANJ-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Schaarschmidt updated XALANJ-2618:
----------------------------------------
    Description: 
We transform and serialize using encoding ISO-8859-1. With JDK 1.8 all is fine, but with OpenJDK 11 the result will be written (from class ToTextStream) in character references, e.g. "*&#105;&#100;&#61;&#49;*" instead of "*id=1*".

In org/apache/xml/serializer/Encodings.properties (serializer.jar) are various encodings defined, e.g.

{{ISO8859-1  ISO-8859-1  0x00FF}}
{{ISO8859_1  ISO-8859-1  0x00FF}}
{{{color:#ff0000}8859-1{color}     ISO-8859-1  0x00FF}}
{{{color:#ff0000}8859_1{color}     ISO-8859-1  0x00FF}}

First value: Java encoding name

Second value: comma separated preferred mime names.

The class org.apache.xml.serializer.Encodings reads this file in a Properties object and processes the definitions to create EncodingInfo objects and puts them (see method loadEncodingInfo()) into the member fields __encodingTableKeyJava_ and __encodingTableKeyMime_ (both Hashtable). Especially putting Elements into _encodingTableKeyMime is critical because there is not a 1:1 mapping and the latest returned Properties.keys() element replaces the previous ElementInfo object.

Until Java 1.8 the first line from above is the latest entry in Enumeration, therefor _encodingTableKeyMime returns the EncodingInfo object with Java encoding "{color:#14892c}ISO8859-1{color}" for encoding "ISO-8859-1". With Java 11 the elements of the Enumeration returned by Properties.keys() has a different order: the third line from above is the latest entry! Therefor _encodingTableKeyMime returns the EncodingInfo object with Java encoding "*{color:#ff0000}8859-1{color}*" when asking for encoding "ISO-8859-1". But: "8859-1" ist not a valid Java encoding name! Method EncodingInfo.inEncoding(char,String) fails internally with an *UnsupportedEncodingException* and returns false.

The methods in class Encodings first searches EncodingInfo object in _encodingTableKeyJava and uses elements from _encodingTableKeyMime as fallback.

I suggest the definitions in Encodings.properties must be extended with additional lines, e.g.

{{*{color:#14892c}ISO-8859-1{color}* ISO-8859-1  0x00FF}}

Also for encodings ISO-8859-2..9. Or all entries with Java encoding name "8859*" should be removed. (They are not valid Java encoding names - UnsupportedEncodingException!)

Finally I think, the current mechanism of collecting the EncodingInfo objects using two Hashtables is critical.

  was:
We transform and serialize using encoding ISO-8859-1. With JDK 1.8 all is fine, but with OpenJDK 11 the result will be written (from class ToTextStream) in character references, e.g. "*&#105;&#100;&#61;&#49;*" instead of "*id=1*".

In org/apache/xml/serializer/Encodings.properties (serializer.jar) are various encodings defined, e.g.

{{ISO8859-1  ISO-8859-1  0x00FF}}
{{ ISO8859_1  ISO-8859-1  0x00FF}}
{{ {color:#FF0000}8859-1{color}     ISO-8859-1  0x00FF}}
{{ {color:#FF0000}8859_1{color}     ISO-8859-1  0x00FF}}

First value: Java encoding name

Second value: comma separated preferred mime names.

The class org.apache.xml.serializer.Encodings reads this file in a Properties object and processes the definitions to create EncodingInfo objects and puts them (see method loadEncodingInfo()) into the member fields __encodingTableKeyJava_ and __encodingTableKeyMime_ (both Hashtable). Especially putting Elements into _encodingTableKeyMime is critical because there is not a 1:1 mapping and the latest returned Properties.keys() element replaces the previous ElementInfo object.

Until Java 1.8 the first line from above is the latest entry in Enumeration, therefor _encodingTableKeyMime returns the EncodingInfo object with Java encoding "{color:#14892c}ISO8859-1{color}" for encoding "ISO-8859-1". With Java 11 the elements of the Enumeration returned by Properties.keys() has a different order: the third line from above is the latest entry! Therefor _encodingTableKeyMime returns the EncodingInfo object with Java encoding "*{color:#FF0000}8859-1{color}*" when asking for encoding "ISO-8859-1". But: "8859-1" ist not a valid Java encoding name! Method EncodingInfo.inEncoding(char,String) fails internally with an *UnsupportedEncodingException* and returns false.

The methods in class Encodings first searches EncodingInfo object in _encodingTableKeyJava and uses elements from _encodingTableKeyMime as fallback.

I suggest the definitions in Encodings.properties must be extended with additional lines, e.g.

*{color:#14892c}ISO-8859-1{color}* ISO-8859-1  0x00FF

Also for encodings ISO-8859-2..9. Or all entries with Java encoding name "8859*" should be removed. (They are not valid Java encoding names - UnsupportedEncodingException!)

Finally I think, the current mechanism of collecting the EncodingInfo objects using two Hashtables is critical.


> Error in org/apache/xml/serializer/Encodings.properties
> -------------------------------------------------------
>
>                 Key: XALANJ-2618
>                 URL: https://issues.apache.org/jira/browse/XALANJ-2618
>             Project: XalanJ2
>          Issue Type: Bug
>      Security Level: No security risk; visible to anyone(Ordinary problems in Xalan projects.  Anybody can view the issue.) 
>          Components: Serialization, transformation
>    Affects Versions: 2.7.2
>         Environment: Java 11
>            Reporter: Simon Schaarschmidt
>            Assignee: Steven J. Hathaway
>            Priority: Major
>              Labels: Java11
>
> We transform and serialize using encoding ISO-8859-1. With JDK 1.8 all is fine, but with OpenJDK 11 the result will be written (from class ToTextStream) in character references, e.g. "*&#105;&#100;&#61;&#49;*" instead of "*id=1*".
> In org/apache/xml/serializer/Encodings.properties (serializer.jar) are various encodings defined, e.g.
> {{ISO8859-1  ISO-8859-1  0x00FF}}
> {{ISO8859_1  ISO-8859-1  0x00FF}}
> {{{color:#ff0000}8859-1{color}     ISO-8859-1  0x00FF}}
> {{{color:#ff0000}8859_1{color}     ISO-8859-1  0x00FF}}
> First value: Java encoding name
> Second value: comma separated preferred mime names.
> The class org.apache.xml.serializer.Encodings reads this file in a Properties object and processes the definitions to create EncodingInfo objects and puts them (see method loadEncodingInfo()) into the member fields __encodingTableKeyJava_ and __encodingTableKeyMime_ (both Hashtable). Especially putting Elements into _encodingTableKeyMime is critical because there is not a 1:1 mapping and the latest returned Properties.keys() element replaces the previous ElementInfo object.
> Until Java 1.8 the first line from above is the latest entry in Enumeration, therefor _encodingTableKeyMime returns the EncodingInfo object with Java encoding "{color:#14892c}ISO8859-1{color}" for encoding "ISO-8859-1". With Java 11 the elements of the Enumeration returned by Properties.keys() has a different order: the third line from above is the latest entry! Therefor _encodingTableKeyMime returns the EncodingInfo object with Java encoding "*{color:#ff0000}8859-1{color}*" when asking for encoding "ISO-8859-1". But: "8859-1" ist not a valid Java encoding name! Method EncodingInfo.inEncoding(char,String) fails internally with an *UnsupportedEncodingException* and returns false.
> The methods in class Encodings first searches EncodingInfo object in _encodingTableKeyJava and uses elements from _encodingTableKeyMime as fallback.
> I suggest the definitions in Encodings.properties must be extended with additional lines, e.g.
> {{*{color:#14892c}ISO-8859-1{color}* ISO-8859-1  0x00FF}}
> Also for encodings ISO-8859-2..9. Or all entries with Java encoding name "8859*" should be removed. (They are not valid Java encoding names - UnsupportedEncodingException!)
> Finally I think, the current mechanism of collecting the EncodingInfo objects using two Hashtables is critical.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@xalan.apache.org
For additional commands, e-mail: dev-help@xalan.apache.org