You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Rob Davis-5 <te...@robertjdavis.co.uk> on 2008/03/11 21:08:55 UTC

LSoutput setEncoding("UTF-8") ignored - output XML not UTF-8 compliant

I am reading in a UTF-8 XML SVG document
I process it
Then I wish to output it
However the encoding is different from the original document, it should
remain as UTF-8
SVG Editor Inkscape rejects the document because of this

I have searched "UTF-8", "encoding" and both terms but can't find a problem
even similar to mine

Here is my code to output the file (solution 1):

		DOMImplementationRegistry registry =
			DOMImplementationRegistry.newInstance( );
	
		
		DOMImplementationLS lsImpl =
			(DOMImplementationLS)registry.getDOMImplementation("LS");
			

		/* Serialize the document */
		LSSerializer serializer = lsImpl.createLSSerializer( );

		LSOutput output = lsImpl.createLSOutput( );
		output.setEncoding("UTF-8");
		
		System.out.println( "encoding: " + output.getEncoding() );
		
		output.setCharacterStream(new FileWriter(new File("C:\\file.svg")));
		
		output.setCharacterStream( stringWriter );
		
		serializer.write(doc, output);	


Here is the fragment of the file where the encoding is different:


input svg xml file:		

   <glyph horiz-adv-x="833" unicode="©"> etc...

- inkscape can read


output svg file:

   <glyph horiz-adv-x="833" unicode="©"> etc...
		
- inkscape says it cannot read -because it is not proper UTF-8
- i know this because if i remove the line, then inkscape can open the file
ok


However if I use the following code (solution 2) it works - i.e. the
encoding is correct, *BUT* this method uses StringBuffers which are not
large enough - only part of the file gets written so I need to use solution
(1) above really.

This proves to me it is the output encoding and NOT anything to do with the
input parsing.

Solution 2:

		DOMImplementationRegistry registry =
			DOMImplementationRegistry.newInstance( );
	
		
		DOMImplementationLS lsImpl =
			(DOMImplementationLS)registry.getDOMImplementation("LS");
			

		/* Serialize the document */
		LSSerializer serializer = lsImpl.createLSSerializer( );
		
		LSOutput output = lsImpl.createLSOutput( );

		
		StringWriter stringWriter = new StringWriter();
		

		
		output.setCharacterStream( stringWriter );
		

		

		
		FileOutputStream fileOutputStream = new FileOutputStream( "C:\\file.svg"
);
		
		OutputStreamWriter outputStreamWriter = new OutputStreamWriter(
fileOutputStream, "UTF-8" );
		
		
		outputStreamWriter.write( stringWriter.toString() );



-- 
View this message in context: http://www.nabble.com/LSoutput-setEncoding%28%22UTF-8%22%29-ignored---output-XML-not-UTF-8-compliant-tp15988542p15988542.html
Sent from the Xerces - J - Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: LSoutput setEncoding("UTF-8") ignored - output XML not UTF-8 compliant

Posted by Stanimir Stamenkov <s7...@netscape.net>.
Tue, 11 Mar 2008 13:08:55 -0700 (PDT), /Rob Davis-5/:

> So in conclusion if I use solution 2 - then I get the desired encoding - 
> I've got a work around - my problem is solved. But 
> LSoutput setEncoding("UTF-8") 
>  
> DOES NOT WORK 
> so i'd still like to know why

In both cases you're serializing to a character stream where the 
character stream controls the encoding.  Just don't do that and set 
a byte stream to the serializer so it can control the encoding.  As 
to the exact reason why your first case fails, you should have read 
the FileWriter class documentation 
<http://java.sun.com/j2se/1.5.0/docs/api/java/io/FileWriter.html>:

> Convenience class for writing character files. The constructors of 
> this class assume that the default character encoding and the 
> default byte-buffer size are acceptable. To specify these values 
> yourself, construct an OutputStreamWriter on a FileOutputStream.

-- 
Stanimir

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: LSoutput setEncoding("UTF-8") ignored - output XML not UTF-8 compliant

Posted by Rob Davis-5 <te...@robertjdavis.co.uk>.


In solution 2 - I missed out then line

		outputStreamWriter.close();

So the entire XML is written to the file

And inkscape accepts and displays the file


So in conclusion if I use solution 2 - then I get the desired encoding -
I've got a work around - my problem is solved. But
LSoutput setEncoding("UTF-8")

DOES NOT WORK
so i'd still like to know why

I would still appreciate some advice - as the StringBuffer solution may not
always work if the file gets too big!!!

doesnt work

-- 
View this message in context: http://www.nabble.com/LSoutput-setEncoding%28%22UTF-8%22%29-ignored---output-XML-not-UTF-8-compliant-tp15988542p15988561.html
Sent from the Xerces - J - Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org