You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by Kristian Eide <kr...@medallia.com> on 2012/01/27 21:26:54 UTC

Issue with control characters in SXSSFWorkbook

I ran into the following problem when I recently switched from HSSFWorkbook
to SXSSFWorkbook. It looks like control characters are being inserted into
the XML, and despite being properly XML encoded this is not legal in XML
1.0. Consider the following code:

	Workbook w = new SXSSFWorkbook();
	w.createSheet().createRow(0).createCell(0).setCellValue("\u0013");

This results in a file that cannot be opened in Excel. If we look at
'xl/worksheets/sheet1.xml' inside the archive we find:

	<c r="A1" t="inlineStr"><is><t>&#19;</t></is></c>

This is not legal XML 1.0. xmllint says: "parser error : xmlParseCharRef:
invalid xmlChar value 19".

The above works with both HSSFWorkbook and XSSFWorkbook, however, in that
the resulting file can be opened by Excel at least. With XSSFWorkbook the
string is stored in 'xl/sharedStrings.xml' like this:

	<si><t>?</t></si>

Notice that the control character has been replaced by '?'. I've resorted to
manually removing the control characters which are not legal in XML 1.0 in
my program:

http://en.wikipedia.org/wiki/XML#Valid_characters

I think this is something SXSSFWorkbook should handle internally, however.

Thanks!

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Issue-with-control-characters-in-SXSSFWorkbook-tp5436570p5436570.html
Sent from the POI - Dev mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org