You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@santuario.apache.org by Hess Yvan <Yv...@imtf.ch> on 2007/02/13 11:39:09 UTC

Signed document can be corrupted in certain circumstances

Hi everybody,
 
I think I found a critical bug into XML security V1.4.0 (Java). A XML document signed with Apache XML security can be corrupted in certain circumstances.  
 
Here are the start conditions and the results I have:
 
1. XML document encoding in "UTF-8" having a UNICODE character "\u263A"
2. The document is signed with Apache XML security --->  OK
3. The document is verified with Apache XML security --->  OK
4. The document is verified with IBM toolkit (XSS4J) ---> NOT OK
 
Doing some investigation, I think I isolated the problem. It seems that the error is due to the Canonicalizer class. This class doesn't treat correctly UTF-8 characters coded on three bytes. Here is a test I did to confirm the problem:
 
     // XML character \u263A => &#x0263A; => smiley 
      String xmlString = "<document>Humour document (héhé \u263A)</document>";
      byte[] xml = xmlString.getBytes("UTF-8");
      String xmlHex = HexadecimalConvertor.toHex(xml);
      
      System.out.println(xmlString);
      System.out.println("Hexadecimal value: " + xmlHex);
 
      // Get the DOM document
      Document document = new XMLParser().parseXMLDocument(new ByteArrayInputStream(xml));
 
      // Canonical 
      byte[] canonicalXML = Canonicalizer.getInstance(Canonicalizer.ALGO_ID_C14N_WITH_COMMENTS).canonicalizeSubtree(document);
      String canonicalXMLHex = HexadecimalConvertor.toHex(canonicalXML);
      String canonicalXMLString = new String(canonicalXML, "UTF-8");
 
      System.out.println("Hexadecimal value: " + canonicalXMLHex);
      System.out.println(canonicalXMLString);
 
and here is the result
 
<document>Humour document (héhé ☺)</document>
value: 3c646f63756d656e743e48756d6f757220646f63756d656e74202868c3a968c3a920 e298ba 293c2f646f63756d656e743e
value: 3c646f63756d656e743e48756d6f757220646f63756d656e74202868c3a968c3a920 3a     293c2f646f63756d656e743e
<document>Humour document (héhé :)</document>
 
The Canonicalizer class treats correctly the character "é" (E9) converted in UTF-8 as "c3a9". BUT the unicode character "☺" (263A) is converted as ":" (3a) but should be (e298ba); this is wrong. It seems that the Canonicalizer class doesn't manage correctly "UTF-8" characters coded on three bytes !
 
Anybody has an idea ? Can someboy help me because it occurs in the context of our application and now we have a lot of problems due to this situation.
 
Thanks in advance.
 
Regards. Yvan Hess
 
 
Yvan Hess

Chief Software Architect

 

e-mail: yvan.hess@imtf.ch
phone : +41 (0)26 460 66 66 
fax   : +41 (0)26 460 66 60 

 

Informatique-MTF SA
Route du Bleuet 1 
CH-1762 Givisiez 

Excellence in Compliance and Document Management

http://www.imtf.com <http://www.imtf.com/> 

 

DISCLAIMER 
This message is intended only for use by the person to whom it is addressed. It may contain information that is privileged and confidential. Its content does not constitute a formal commitment by IMTF. If you are not the intended recipient of this message, kindly notify the sender immediately and destroy this message. Thank You.

 

Re: Signed document can be corrupted in certain circumstances

Posted by Raul Benito <ra...@apache.org>.
Hi Hess,

It is my fault, we have a "critic" bug
http://issues.apache.org/bugzilla/show_bug.cgi?id=41462 , the problem
is that I was thinking in 8bits instead of 32bits. now it is quite
fixed in head but we are having a problem with some part of unicode. I
think I will do a 1.4.1 with this bug and several others.
And we have to reconsider my release strategy as it seems that nobody,
not too many people test the release candidates :(.


On 2/13/07, Hess Yvan <Yv...@imtf.ch> wrote:
>
>
> Hi everybody,
>
> I think I found a critical bug into XML security V1.4.0 (Java). A XML
> document signed with Apache XML security can be corrupted in certain
> circumstances.
>
> Here are the start conditions and the results I have:
>
> 1. XML document encoding in "UTF-8" having a UNICODE character "\u263A"
> 2. The document is signed with Apache XML security --->  OK
> 3. The document is verified with Apache XML security --->  OK
> 4. The document is verified with IBM toolkit (XSS4J) ---> NOT OK
>
> Doing some investigation, I think I isolated the problem. It seems that the
> error is due to the Canonicalizer class. This class doesn't treat correctly
> UTF-8 characters coded on three bytes. Here is a test I did to confirm the
> problem:
>
>      // XML character \u263A => &#x0263A; => smiley
>       String xmlString = "<document>Humour document (héhé
> \u263A)</document>";
>       byte[] xml = xmlString.getBytes("UTF-8");
>       String xmlHex = HexadecimalConvertor.toHex(xml);
>
>       System.out.println(xmlString);
>       System.out.println("Hexadecimal value: " + xmlHex);
>
>       // Get the DOM document
>       Document document = new
> XMLParser().parseXMLDocument(new
> ByteArrayInputStream(xml));
>
>       // Canonical
>       byte[] canonicalXML =
> Canonicalizer.getInstance(Canonicalizer.ALGO_ID_C14N_WITH_COMMENTS).canonicalizeSubtree(document);
>       String canonicalXMLHex = HexadecimalConvertor.toHex(canonicalXML);
>       String canonicalXMLString = new String(canonicalXML, "UTF-8");
>
>       System.out.println("Hexadecimal value: " + canonicalXMLHex);
>       System.out.println(canonicalXMLString);
>
> and here is the result
>
> <document>Humour document (héhé ☺)</document>
> value:
> 3c646f63756d656e743e48756d6f757220646f63756d656e74202868c3a968c3a920
> e298ba 293c2f646f63756d656e743e
> value:
> 3c646f63756d656e743e48756d6f757220646f63756d656e74202868c3a968c3a920
> 3a     293c2f646f63756d656e743e
> <document>Humour document (héhé :)</document>
>
> The Canonicalizer class treats correctly the character "é" (E9) converted in
> UTF-8 as "c3a9". BUT the unicode character "☺" (263A) is converted as ":"
> (3a) but should be (e298ba); this is wrong. It seems that the Canonicalizer
> class doesn't manage correctly "UTF-8" characters coded on three bytes !
>
> Anybody has an idea ? Can someboy help me because it occurs in the context
> of our application and now we have a lot of problems due to this situation.
>
> Thanks in advance.
>
> Regards. Yvan Hess
>
>
>
>
> Yvan Hess
>
> Chief Software Architect
>
>
>
>
>
> e-mail: yvan.hess@imtf.ch
> phone : +41 (0)26 460 66 66
> fax   : +41 (0)26 460 66 60
>
>
>
> Informatique-MTF SA
> Route du Bleuet 1
> CH-1762 Givisiez
>
> Excellence in Compliance and Document Management
>
> http://www.imtf.com
>
>
>
> DISCLAIMER
> This message is intended only for use by the person to whom it is addressed.
> It may contain information that is privileged and confidential. Its content
> does not constitute a formal commitment by IMTF. If you are not the intended
> recipient of this message, kindly notify the sender immediately and destroy
> this message. Thank You.
>


-- 
http://r-bg.com