You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Rob Davis-5 <te...@robertjdavis.co.uk> on 2008/03/11 21:08:55 UTC
LSoutput setEncoding("UTF-8") ignored - output XML not UTF-8
compliant
I am reading in a UTF-8 XML SVG document
I process it
Then I wish to output it
However the encoding is different from the original document, it should
remain as UTF-8
SVG Editor Inkscape rejects the document because of this
I have searched "UTF-8", "encoding" and both terms but can't find a problem
even similar to mine
Here is my code to output the file (solution 1):
DOMImplementationRegistry registry =
DOMImplementationRegistry.newInstance( );
DOMImplementationLS lsImpl =
(DOMImplementationLS)registry.getDOMImplementation("LS");
/* Serialize the document */
LSSerializer serializer = lsImpl.createLSSerializer( );
LSOutput output = lsImpl.createLSOutput( );
output.setEncoding("UTF-8");
System.out.println( "encoding: " + output.getEncoding() );
output.setCharacterStream(new FileWriter(new File("C:\\file.svg")));
output.setCharacterStream( stringWriter );
serializer.write(doc, output);
Here is the fragment of the file where the encoding is different:
input svg xml file:
<glyph horiz-adv-x="833" unicode="©"> etc...
- inkscape can read
output svg file:
<glyph horiz-adv-x="833" unicode="©"> etc...
- inkscape says it cannot read -because it is not proper UTF-8
- i know this because if i remove the line, then inkscape can open the file
ok
However if I use the following code (solution 2) it works - i.e. the
encoding is correct, *BUT* this method uses StringBuffers which are not
large enough - only part of the file gets written so I need to use solution
(1) above really.
This proves to me it is the output encoding and NOT anything to do with the
input parsing.
Solution 2:
DOMImplementationRegistry registry =
DOMImplementationRegistry.newInstance( );
DOMImplementationLS lsImpl =
(DOMImplementationLS)registry.getDOMImplementation("LS");
/* Serialize the document */
LSSerializer serializer = lsImpl.createLSSerializer( );
LSOutput output = lsImpl.createLSOutput( );
StringWriter stringWriter = new StringWriter();
output.setCharacterStream( stringWriter );
FileOutputStream fileOutputStream = new FileOutputStream( "C:\\file.svg"
);
OutputStreamWriter outputStreamWriter = new OutputStreamWriter(
fileOutputStream, "UTF-8" );
outputStreamWriter.write( stringWriter.toString() );
--
View this message in context: http://www.nabble.com/LSoutput-setEncoding%28%22UTF-8%22%29-ignored---output-XML-not-UTF-8-compliant-tp15988542p15988542.html
Sent from the Xerces - J - Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org
Re: LSoutput setEncoding("UTF-8") ignored - output XML not UTF-8
compliant
Posted by Stanimir Stamenkov <s7...@netscape.net>.
Tue, 11 Mar 2008 13:08:55 -0700 (PDT), /Rob Davis-5/:
> So in conclusion if I use solution 2 - then I get the desired encoding -
> I've got a work around - my problem is solved. But
> LSoutput setEncoding("UTF-8")
>
> DOES NOT WORK
> so i'd still like to know why
In both cases you're serializing to a character stream where the
character stream controls the encoding. Just don't do that and set
a byte stream to the serializer so it can control the encoding. As
to the exact reason why your first case fails, you should have read
the FileWriter class documentation
<http://java.sun.com/j2se/1.5.0/docs/api/java/io/FileWriter.html>:
> Convenience class for writing character files. The constructors of
> this class assume that the default character encoding and the
> default byte-buffer size are acceptable. To specify these values
> yourself, construct an OutputStreamWriter on a FileOutputStream.
--
Stanimir
---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org
Re: LSoutput setEncoding("UTF-8") ignored - output XML not UTF-8
compliant
Posted by Rob Davis-5 <te...@robertjdavis.co.uk>.
In solution 2 - I missed out then line
outputStreamWriter.close();
So the entire XML is written to the file
And inkscape accepts and displays the file
So in conclusion if I use solution 2 - then I get the desired encoding -
I've got a work around - my problem is solved. But
LSoutput setEncoding("UTF-8")
DOES NOT WORK
so i'd still like to know why
I would still appreciate some advice - as the StringBuffer solution may not
always work if the file gets too big!!!
doesnt work
--
View this message in context: http://www.nabble.com/LSoutput-setEncoding%28%22UTF-8%22%29-ignored---output-XML-not-UTF-8-compliant-tp15988542p15988561.html
Sent from the Xerces - J - Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org