You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@poi.apache.org by Christian Märzinger <ch...@gmail.com> on 2011/01/20 21:13:09 UTC

Codierung inputstream and outputstream

Hello!

I'am reading some packageParts of a docx Files.
This files are transferred to a org.w3c.dom.Document when I write out 
this stream to the part in the OutputStream some characters where corrupted.

This are characters like
ö,ä,ü and so on.

Is there something to check on.
Enclosed my code.

Thanks a lot!

Greetings

Christian

Reading from InputStream

	DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
	DocumentBuilder builder;

         try {
		factory.setIgnoringElementContentWhitespace(true);
		builder = factory.newDocumentBuilder();
		domDocument = builder.parse(coreDoc.getInputStream(),"UTF-8");

	}catch....


Writing to OutputStream

	TransformerFactory tf = TransformerFactory.newInstance();

	Transformer transformer = tf.newTransformer();
     	transformer.setOutputProperty(OutputKeys.STANDALONE, "yes");

	domDocument.setXmlStandalone(true);
	DOMSource source = new DOMSource(domDocument);
	StringWriter writer = new StringWriter()
	StreamResult result = new StreamResult(writer);

	transformer.transform(source, result);
	String strDocument = writer.toString();
	mpOut = coreDoc.getOutputStream();
	mpOut.write(strDocument.getBytes(),0,strDocument.length());
	mpOut.close();


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org

Re: Codierung inputstream and outputstream

Posted by Nick Burch <ni...@alfresco.com>.

On Thu, 20 Jan 2011, Christian Märzinger wrote:
> I'am reading some packageParts of a docx Files.
> This files are transferred to a org.w3c.dom.Document when I write out this 
> stream to the part in the OutputStream some characters where corrupted.

Input and Output streams are byte level, while Readers and Writers are 
character level. You need to do the right thing about encodings when 
moving between the two. If you use POI usermodel classes we do that for 
you, if you do the low level stuff yourself then you need to do it 
yourself...

Nick