You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by rajneesh sharma <ra...@gmail.com> on 2012/06/20 11:12:05 UTC
control characters does not gets handled properly
Hi All
I have two problems related to control characters handling in POI.
1. control characters replaced by '?' , ideally I want to show control
characters.
2. String pattern _x00HH_ gets converted to control characters (here H is
0-9 or A-F)
I investigated the problem and find out that following facts that causes
this problem
1. Excel internally uses OOXML and XML does not support control characters.
2. To overcome this XML limitation Excel escapes the control character.
3. If that escape of control character comes in a string then this also has
to be escaped so that it does not gets converted to control character.
Excel does it using following logic
# Excel escapes control characters with _xHHHH_ and also escapes any
# literal strings of that type by encoding the leading underscore. So
# "\0" -> _x0000_ and "_x0000_" -> _x005F_x0000_.
Is it a bug or is it intended? If it's a bug then, are you going to solve
it in future releases?
Regards
Rajneesh
Re: control characters does not gets handled properly
Posted by Nick Burch <ni...@alfresco.com>.
On 20/06/12 10:12, rajneesh sharma wrote:
> I have two problems related to control characters handling in POI.
> 1. control characters replaced by '?' , ideally I want to show control
> characters.
Are you sure they're being turned into question marks? That's often what
you get if you try to display a character that your current terminal
encoding + font doesn't support. Try checking in a debugger, or turning
the string into a char array and casting each char to an int to see what
codepoint it is
> 1. Excel internally uses OOXML and XML does not support control characters.
> 2. To overcome this XML limitation Excel escapes the control character.
This doesn't sound excel specific, using entities and escapes is pretty
standard with xml processing for characters not supported.
XMLBeans should be decoding any characters that get xml escaped/encoded,
so that you get a real character back. Are you sure you're not seeing this?
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org