You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Markus Jais <mj...@gmx-gmbh.de> on 2004/02/25 09:36:37 UTC

problem with DOM and Encoding

hi

I have 2 different methods for outputting a DOM Tree:


Methode 1: (DOM 3 / Xerces 2.4)
===========
public String toXML2() throws RPCException
        {
                String              ret = "UNO";
                DOMWriter writer = null;
                
                try {
                        writer = ((DOMImplementationLS)this.implementation).createDOMWriter();
                        writer.setEncoding("ISO-8859-1");
                        ret += writer.writeToString(document);
                } catch (DOMException domex) {
                        ;
                }
                return ret;
                
        }
        

Methode 2: (JAXP + TRAX with Xerces 2.4)
============
public String toXML() throws RPCException
        {
                String ret = "DOS";
                try {
                        TransformerFactory transFactory = TransformerFactory.newInstance();
                        Transformer transformer = transFactory.newTransformer();
                        Source input = new DOMSource(document);
                        StringWriter strWriter = new StringWriter();
                        Result output = new StreamResult(strWriter);    
                        transformer.setOutputProperty(OutputKeys.ENCODING, "ISO-8859-1");
                        transformer.transform(input, output);
                        ret += strWriter.toString();
                        try {
                                strWriter.close();
                        } catch (IOException ex) {
                                throw new RPCException("Error closing String Writer");
                        }
                        return ret;
                        
                } catch (TransformerConfigurationException ex) {
                        ;
                } catch (TransformerException ex) {
                        ;
                }
                
        }




both return a string
method 1 turns "München" into:  
M&#xfffd;nchen

and method 2 into
M&#65533;nchen

(If your news reader does not display german umlauts: in HTML you would
write "M&uuml;nchen")


it seems that both encodings are wrong and also different. 
this is weird. I also did a

export LANG=de_DE

no changes although within my JBOSS message driven bean it works with
this variable set. 

I have Suse Linux 8.2 and java version "1.4.2_03"


any ideas what's wrong here ? 



Markus
-- 
Markus Jais
Software Developer
GMX GmbH
Riesstraße 17, 80992 München
Phone: +49 89 14 339-514
mailto:mjais@gmx-gmbh.de
http://www.gmx.de



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


RE: XMLSerializer

Posted by John Hughes <jo...@entegrity.com>.
Found what was causing this.

When using setPreserveSpace(false) in the OutputFormat the serializer
converts '\n' to ' '.

What is the rationale for this?

John

> -----Original Message-----
> From: John Hughes [mailto:john.hughes@entegrity.com]
> Sent: 25 February 2004 11:30
> To: xerces-j-user@xml.apache.org
> Subject: XMLSerializer
>
>
> I'm using XMLSerializer to output some base64 encoded information within
> certain elements.  the base64 information has '\n' line
> separators.  However
> after serialization this '\n' are converted into spaces.
>
> Has any one seen this? and how can I preserve the '\n' values so they are
> not converted?
>
>
> john
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


XMLSerializer

Posted by John Hughes <jo...@entegrity.com>.
I'm using XMLSerializer to output some base64 encoded information within
certain elements.  the base64 information has '\n' line separators.  However
after serialization this '\n' are converted into spaces.

Has any one seen this? and how can I preserve the '\n' values so they are
not converted?


john


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org